MAAP-Project / maap-hec-aws

2 stars 0 forks source link

Update metrics schema in ADES-PBS #83

Closed jjacob7734 closed 1 year ago

jjacob7734 commented 2 years ago

Modify the metrics schema in ADES-PBS to match the doc linked to in the DoD below. This includes adding ADES ID, Job Type, User ID, and NodeType to the metrics.

Definition of Done:

jjacob7734 commented 1 year ago

Step-wise and workflow metrics match the schema, but the memory_max_gb for the workflow and each step is set to a fill value of -999.0 because I don't know of a way to get that information on Pleiades. The ades_id is set when the Flask app is started and reported at the top level of the JSON response of all endpoints. The username is set in an HTTP parameter and reported as part of the statusInfo. The job_type is reported as the procID. The time_queued is stored as state in the SQLite jobs table when the executeJob request is processed.

An additional element I added to all endpoint responses is api_version giving the version number of our JSON response specification. The idea is that we will update that whenever we change the response structure so that client codes can check that and support multiple versions for backward compatibility across future releases.

Here is a sample of the getJobStatus response JSON: { "ades_id": "ades-pbs-dev-jjacob-01", "api_version": "1.0", "statusInfo": { "jobID": "downsample-dem-workflow-0.0.1-1006555aa1bdb93853678b1f5c05c0b6a92bcb0d", "job_type": "downsample-dem-workflow-0.0.1", "metrics": { "downsample_dem": { "exit_code": 0, "memory_max_gb": -999.0, "node": { "cores": 1, "disk_space_free_gb": 2381834.4552001953, "hostname": "r623i0n0.p4.nas.nasa.gov", "ip_address": "10.150.37.118", "memory_gb": 125.30337905883789, "node_type": "broadwell" }, "time_duration_seconds": 23.0, "time_end": "2022-08-21T06:38:18+0000", "time_start": "2022-08-21T06:37:55+0000", "work_dir_size_gb": 0.804604159668088 }, "stage_in": { "exit_code": 0, "memory_max_gb": -999.0, "node": { "cores": 1, "disk_space_free_gb": 2381834.4552001953, "hostname": "r623i0n0.p4.nas.nasa.gov", "ip_address": "10.150.37.118", "memory_gb": 125.30337905883789, "node_type": "broadwell" }, "time_duration_seconds": 12.0, "time_end": "2022-08-21T06:37:55+0000", "time_start": "2022-08-21T06:37:43+0000", "work_dir_size_gb": 0.804604159668088 }, "stage_out": { "exit_code": 1, "memory_max_gb": -999.0, "node": { "cores": 1, "disk_space_free_gb": 2381834.4552001953, "hostname": "r623i0n0.p4.nas.nasa.gov", "ip_address": "10.150.37.118", "memory_gb": 125.30337905883789, "node_type": "broadwell" }, "time_duration_seconds": 4.0, "time_end": "2022-08-21T06:38:22+0000", "time_start": "2022-08-21T06:38:18+0000", "work_dir_size_gb": 0.804604159668088 }, "workflow": { "exit_code": 1, "memory_max_gb": -999.0, "node": { "cores": 1, "disk_space_free_gb": 2381834.4552001953, "hostname": "r623i0n0.p4.nas.nasa.gov", "ip_address": "10.150.37.118", "memory_gb": 125.30337905883789, "node_type": "broadwell" }, "time_duration_seconds": 39.0, "time_end": "2022-08-21T06:38:22+0000", "time_start": "2022-08-21T06:37:43+0000", "work_dir_size_gb": 2.413812479004264 } }, "status": "failed", "time_queued": "2022-08-21T06:36:52+0000", "username": "jjacob" } }