Open thanh-lam opened 3 years ago
The script prints out the list of user jobs fine until it hit the TypeError, when jobs have state = reverted. Bill found out from the database or indices that "reverted" jobs have empty "end_time". And, python3 flags that as a TypeError when it tried to print out the job record, as in this print statement:
print( print_fmt.format(
data.get("allocation_id"), data.get("primary_job_id"), data.get("secondary_job_id"),
data.get("begin_time"), cast.deep_get(data,"history","end_time"),
data.get("state")))
To fix that, we need to check the field 'cast.deep_get(data,"history","end_time")'
and print out a blank if it's empty. This is the closest fix we can get and it works exactly as it meant to be.
condition = cast.deep_get(data, "history","end_time")
print( print_fmt.format(
data.get("allocation_id"), data.get("primary_job_id"), data.get("secondary_job_id"),
data.get("begin_time"), cast.deep_get(data,"history","end_time") if (condition!=None) else " ",
data.get("state")))
Adding the line "condition = ..." to make the code more readable for checking the field with "if ... else ..."
condition.
Similar fix can also be applied to another script "findJobsRunning.py"
.
condition = cast.deep_get(data, "history","end_time")
print(print_fmt.format(
data.get("allocation_id"), data.get("primary_job_id"), data.get("secondary_job_id"),
data.get("begin_time"), cast.deep_get(data, "history","end_time") if (condition!=None) else " "))
Thanks @thanh-lam for working with me and writing this up. I'm the process of reviewing some of the other scripts to ensure we catch similar cases. I will add the details to this specific issue.
Similar fix can also be applied to another script findJobsInRange.py
.
if data:
condition = cast.deep_get(data, "history","end_time")
print(print_fmt.format(
data.get("allocation_id"), data.get("primary_job_id"), data.get("secondary_job_id"),
data.get("begin_time"), cast.deep_get(data, "history","end_time") if (condition!=None) else " ",
data.get("user_name")))
@thanh-lam These are some examples of the query after the fix was implemented.
[root@c650f99p06 python]# ./findUserJobs.py -u tlam --state reverted
State | AID | P Job ID | S Job ID | Begin Time | End Time
[root@c650f99p06 python]# ./findUserJobs.py -u wcmorris --state reverted
State | AID | P Job ID | S Job ID | Begin Time | End Time
[root@c650f99p06 python]# ./findUserJobs.py -u root --state reverted
State | AID | P Job ID | S Job ID | Begin Time | End Time
reverted | 6 | 1 | 0 | 2021-02-23 14:01:39.697209 |
[root@c650f99p06 python]# ./findUserJobs.py -u root
State | AID | P Job ID | S Job ID | Begin Time | End Time
complete | 1 | 1 | 0 | 2021-02-23 12:04:34.828635 | 2021-02-23 12:04:39.513245
complete | 2 | 1 | 0 | 2021-02-23 12:04:40.983847 | 2021-02-23 12:04:43.556549
complete | 3 | 1 | 0 | 2021-02-23 12:05:01.829019 | 2021-02-23 12:05:02.492537
complete | 4 | 1 | 0 | 2021-02-23 13:48:52.624415 | 2021-02-23 13:48:53.528137
complete | 5 | 1 | 0 | 2021-02-23 14:00:14.318896 | 2021-02-23 14:03:32.978141
reverted | 6 | 1 | 0 | 2021-02-23 14:01:39.697209 |
complete | 7 | 1 | 0 | 2021-02-23 14:05:37.494822 | 2021-02-23 14:05:38.328102
complete | 8 | 1 | 0 | 2021-02-23 14:08:06.726752 | 2021-02-23 14:08:07.399833
complete | 9 | 1 | 0 | 2021-02-23 14:09:41.859691 | 2021-02-23 14:09:42.559594
complete | 10 | 1 | 0 | 2021-02-23 14:16:08.829438 | 2021-02-23 14:16:09.533021
complete | 11 | 1 | 0 | 2021-02-23 14:17:05.743261 | 2021-02-23 14:17:06.379795
complete | 12 | 1 | 0 | 2021-02-23 14:18:37.053626 | 2021-02-23 14:18:37.73513
running | 13 | 1 | 0 | 2021-02-23 14:26:50.28166 | 2021-02-23 14:26:50.970676
complete | 14 | 1 | 0 | 2021-02-23 14:28:38.323487 | 2021-02-23 14:28:38.998807
complete | 15 | 1 | 0 | 2021-02-23 14:35:32.508862 | 2021-02-23 14:35:33.167389
Fixed by PR #994.
@thanh-lam I'm going to leave this issue open until you have a chance to verify the fix in the next CAST build.
Describe the bug For querying allocation data, CSM provides python scripts in /opt/ibm/csm/bigdata/python/. One example is "findUserJobs.py" that lists allocation info such as "state" and so on of a job. It produced following error when running with --state reverted. Other states (running, failed, complete) were listed with no error.
To Reproduce Steps to reproduce the behavior:
Expected behavior The command should not produce the error (which looked like an internal condition needed to be handled with the reverted state). Example of a good command output:
Environment (please complete the following information):
Additional context The TypeError could be caused by some "empty" field in the data record with reverted state.
Issue Source: CSM regression tests.