datastax / cstar_perf

Apache Cassandra performance testing platform
Apache License 2.0
72 stars 34 forks source link

Expose tests in series from API endpoint #106

Closed mambocab closed 9 years ago

mambocab commented 9 years ago

Problem

The question I ultimately want answered is "Given a date, what was the SHA of HEAD on ?". I want to avoid, e.g., merge commits. Basically I want HEAD@{<date>} but without the reflog. (This is in the service of running regression tests and comparing the performance of, e.g., trunk 2 weeks ago to trunk today.)

With some massaging, you can get a lossy version of this information from a long-running cstar_perf instance. Given data returned from <tests/artifacts/<test_id>/stats, you can tell that trunk was at sha data['revisions'][n]['git_id'][<blade_name>] if data['revisions'][n]['revision'] is 'apache/trunk'. So, each SHA meeting that requirement was checked out as HEAD on trunk at some point.

Solutions

The missing part of this is an API endpoint that provides UUIDs to use as <test_id>.

Both of those changes would be pretty easy as they just query over the tests table and return test_id values. They'd require no changes to the data model. Deciding between the two, as far as I can tell, is just a question of whether either causes problems for the test database. I don't think either will.

Other possibilities that might make this simpler for a client, but I think aren't useful enough in other cases to change cstar_perf for:

The benefits of those approaches are that even if the data returned from the /stats endpoint changes, client code won't break as long as the SHA/branch API stays the same. I don't think that's worth the effort at this stage, since we're the only people building clients for this information.

aboudreault commented 9 years ago

@mambocab So, what we need is basicaly only 1 API endpoint:

The endpoint would only return the UUID (test_id) of the test. It would be easy to add /api/tests/:UUID to return all information in the future. (or now?)

mambocab commented 9 years ago

Yeah, ultimately, I can do everything I need with an /api/tests endpoint (optionally with timestamps for filtering) that returns a list of all the test UUIDs.

aboudreault commented 9 years ago

On it.

aboudreault commented 9 years ago

@mambocab I cannot add a filter by timestamp for this endpoint due to how the schema is. The scheduled_date is in fact the test_id column (timeuuid) and since it's the primary key I can only filter with EQ and IN operator.

However, since we are already forced to fetch all test_id, I will apply the filter on the backend endpoint rather than directly in the DB. It will avoid returning extra data on each request.

mambocab commented 9 years ago

Just to make sure I understand: you can't filter in the database query, but you can filter at the application layer? Sounds good to me.

aboudreault commented 9 years ago

Yes, exact.

mambocab commented 9 years ago

Yeah, ultimately, I can do everything I need with an /api/tests endpoint (optionally with timestamps for filtering) that returns a list of all the test UUIDs.

aboudreault commented 9 years ago

How do you plan to pass the from/to filters? timestamp or human date? What's easier/convenient for you?

mambocab commented 9 years ago

I don't have strong feelings -- timestamps are what I had in mind, but probably just because that's what the API for series requests uses. As far as I'm concerned, do what you think is best.

aboudreault commented 9 years ago

Fixed