IBM / CAST

CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27 stars 34 forks source link

elastic search returns total_hits as dict object not integer #991

Open thanh-lam opened 3 years ago

thanh-lam commented 3 years ago

Describe the bug A regression test on the CSM bigdata python command "findJobTimeRange.py" produced following message even though the allocation ID (1) in question did exist. That meant "total_hits" has the value 1.

[root@c650f99p06 python]# /opt/ibm/csm/bigdata/python/findJobTimeRange.py -a 1
# Found {'value': 1, 'relation': 'eq'} matches for specified the job.
# This implementation only supports queries where the hit count is equal to 1.

Adding a debug in the print statement in the script showed that total_hits has "value" as {'value': 1, 'relation': 'eq'} that also showed in message above.

[root@c685f4n07 python]# ./findJobTimeRange.py -j 114
# Found {'value': 1, 'relation': 'eq'} matches for specified the job.
# This implementation only supports queries where the hit count is equal to 1. Total hits=  {'value': 1, 'relation': 'eq'}

That led to the condition checking:

    if total_hits != 1:
        print("# This implementation only supports queries where the hit count is equal to 1.")
        return 3

Diagnosis: "total_hits" has "value" 1 but fell into the code path of "total_hits != 1". So total_hits was not in the format of an integer 1. The output `{'value': 1, 'relation': 'eq'} hinted that it is a "dict" object. Further debugging confirmed that.

Python doesn't require data type declaration. Therefore, the change of total_hits from an integer to a "dict" object went without notice. But, the script failed the if condition due to the different format.

To adapt to this change of data type, change total_hits to total_hits['value'].

To Reproduce Steps to reproduce the behavior:

  1. Go to '/opt/ibm/csm/bigdata/python/'
  2. Run ./findJobTimeRange.py -a 1 (make sure allocation 1 existed)
  3. See message:
    # Found {'value': 1, 'relation': 'eq'} matches for specified the job.
    # This implementation only supports queries where the hit count is equal to 1.

Expected behavior The command should display start and end time of the allocation. For example:

[root@c685f4n07 python]# ./findJobTimeRange.py -a 1
# Found {'value': 1, 'relation': 'eq'} matches for specified the job.allocation-id: 1
job-id: 526 - 0
user-name: root 
user-id: 0
begin-time: 2020-09-17.12:06:01:183 
end-time: 2020-09-17.12:06:58:382

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context This is probably due to the python scripts were written for elastic 6.8.1 that is now upgraded to 7.5.1. It's not clear when the change of "total_hits" format happened.

Issue Source: All bigdata python scripts that do the following might fail "un-knowingly".

    total_hits = cast.deep_get(tr_res, "hits", "total")

Here's a list of those scripts:

[root@c685f4n07 python]# grep total_hits *.py
findJobKeys.py:    total_hits = cast.deep_get(tr_res, "hits","total")
findJobMetrics.py:    total_hits = cast.deep_get(tr_res, "hits","total")
findJobsInRange.py:    total_hits    = cast.deep_get(tr_res, "hits","total")
findUserJobs.py:    total_hits = cast.deep_get(resp, "hits","total")
findWeightedErrors.py:    total_hits = cast.deep_get(tr_res, "hits","total")
williammorrison2 commented 3 years ago

@thanh-lam Here is output after the fixes have been implemented. The appropriate results are being returned without the python error.

[root@c650f99p06 python]# /opt/ibm/csm/bigdata/python/findJobTimeRange.py -a 1
# Found 1 matches for specified the job.

allocation-id: 1
job-id: 1 - 0
user-name: root
user-id: 0
begin-time: 2021-02-23.12:04:34:828
end-time: 2021-02-23.12:04:39:513

[root@c650f99p06 python]# ./findJobTimeRange.py -j 1
# Found 15 matches for specified the job.
# This implementation only supports queries where the hit count is equal to 1.
besawn commented 3 years ago

Fixed by #994