Return execution run status from Request Experiment Execution

The current request experiment execution path should return a JSON response that follows this template:

{
    "message": "The request was successful",
    "result": {
        "executionId": "7LYepwB1ewLb3",
        "msg": {
            "container_search_string": [...]
            "default_parameters": { 
                ...
        }
    },
    "status": "success",
    "version": "1.6.1"
}

What this is saying is that the request was received and accepted by the reactor, and assigned an execution id 7LYepwB1ewLb3 which you can see under the result field.

This does not provide visibility into any possible errors in that execution. To do that, we need to check the execution status. A GET request to the following:

https://api.sd2e.org/actors/v2/control-annotator.prod/executions/7LYepwB1ewLb3?x-nonce=$NONCE

will retrieve that. The nonce has been provided to you in a side-channel. This returns JSON as well:

{
  "message": "Actor execution retrieved successfully.", 
  "result": {
    "cpu": 18671696018, 
    "exitCode": 1, 
    "finalState": {
      "Dead": false, 
      "Error": "", 
      "ExitCode": 1, 
      "FinishedAt": "2020-09-23T17:12:46.826Z", 
      "OOMKilled": false, 
      "Paused": false, 
      "Pid": 0, 
      "Restarting": false, 
      "Running": false, 
      "StartedAt": "2020-09-23T17:12:40.026Z", 
      "Status": "exited"
    }, 
    "finishTime": "2020-09-23T17:12:46.826Z", 
    "id": "7LYepwB1ewLb3", 
    "io": 17517, 
    "messageReceivedTime": "2020-09-23T17:12:39.147Z", 
    "runtime": 7, 
    "startTime": "2020-09-23T17:12:39.552Z", 
    "status": "COMPLETE", 
    "workerId": "7KMApj3jrNg5k"
  }, 
  "status": "success", 
  "version": "1.6.1"
}

Note the exit_code and status fields:

A non-zero exit code indicates an execution error
status can be of: ["SUBMITTED", "COMPLETE"]

after being submitted for execution, the reactor will process and transition to the completed state when done.

When status is "COMPLETE" the exit_code will be valid.

For non-zero exit codes, we can pull (and show) logs for the execution via:

https://api.sd2e.org/actors/v2/control-annotator.prod/executions/7LYepwB1ewLb3/logs?x-nonce=$NONCE

{
  "message": "Logs retrieved successfully.", 
  "result": {
    "logs": "..."
  }, 
  "status": "success", 
  "version": "1.6.1"
}

This will allow visibility/clarity into reactor executions that succeed or fail, and if they fail, what the nature of the error was.

IP has been updated to use new TACC endpoint to address #252. This issue, however, will require more changes based on the Slack conversation that went on between @mwes and @mwvaughn on 10/21/2020. As mentioned in the conversation, #252 is in a good state for @mwes to use for milestone 2.10. @mwes will continue to help other users debug the state of an experiment execution until #257 is resolved.

New workflow that this issue will need to build off of:

An ER document can have multiple execution id assigned to an experiment. IP will need to make a request to TACC's endpoint for getting a list of request_id that matches to a experiment_reference_url_for_xplan.
Depending on the request_id selected, IP will need to map request_id -> execution_id
The corresponding execution_id can then get passed into TACCGoAccessor.get_status_of_experiment to get the status of an experiment execution. Information returned from this function, should be reported back to the user in a human readable form.

SD2E / experimental-intent-parser

Return execution run status from Request Experiment Execution #257