Closed nastra closed 8 years ago
LGTM Eduard!
@mshuler one thing I forgot to mention is that a cancelled test now shows up as completed (previously it would just have failed because of missing stats files). However, it will only contain partial data from graphs / logs, so people might get confused. I think we should consider adding a new status Cancelled or something. wdyt?
I would be supportive of displaying a Cancelled status - in fact, I would be supportive of displaying status Failed tests, too, since they are nearly impossible to find other user's failed tests.
@mshuler I totally agree that we should have an overview of Failed and Cancelled tests too (which should be done in a separate PR).
Just to make things clear: if a user now hits the Cancel button, the job will eventually have the Completed status in the UI, but only have partial graphs/logs and such
This might lead to some confusion.
Therefore I'm suggesting that jobs, which got cancelled actually get the cancelled status in the UI. The question is if we want to do this cancelled status as part of CSTAR-85 or a separate ticket & PR?
@mshuler if you're done reviewing, please don't merge yet.
At this time, I still don't have a dev env to test on without setting up on real server(s). https://github.com/datastax/cstar_perf/issues/209
I trust it'll be fine, and we'll fix it if not :)
Commit 1 - making Cancel actually work
Previously, API calls to /api/tests/status/id/ or /api/tests/progress/id would be failing with a 401 - Unauthorized, because the API user test-cluster wouldn't be in the users table and also wouldn't have a role assigned.
This shows that the API user test-cluster by default is not in the users table. Also I could not find any documentation for that requirement.
After talking to @aboudreault, we kind of came to the conclusion, that we really shouldn't do the role check for the API endpoints that are mentioned here. A simple authentication check should be enough at this point.
Below is an excerpt, showing that the API call to /api/tests/status/id/ indeed was using the API user test-cluster and it would fail with a 401.
PS: This issue happens on a dockerized cstar_perf deployment as well as on the DSE cstar_perf deployment that we have. I think it doesn't happen on cstar.datastax.com because the API user was added to the users table.
Commit 2 - Handling termination from Cancel in a graceful manner
2) After solving the Cancel problem in the first commit, I finally know now why it actually never attempts to copy back the log files. The problem is that the JobCancellationTracker is killing cassandra-stress and stress_compare (which handles creation of the stats files / copies back the log files / copies the flamegraphs / ...). Not killing the stress_compare process would actually mean that once you hit the Cancel button:
The second commit now handles the termination of stress_compare in a graceful manner and tries to finish processing the files of the current running revision.
Commit 3 - Correctly set the job status to cancelled if it was cancelled by the user
The third commit now correctly sets the job status to cancelled in the DB if the job was cancelled by the user.