dmwm / cms-htcondor-es

ElasticSearch integration for CMS's HTCondor pool
11 stars 19 forks source link

Add GLIDEIN_MaxMemMBs warning exception and refactor conversion py #200

Closed mrceyhun closed 1 year ago

mrceyhun commented 1 year ago

Because of getting lots of warning logs, we need to add manual exception for this specific field. It should be an integer value or convertible to integer, but it is not coming like that in quite significant amount time. It is asked in SI MM channel. I set a fix till it is fixed in upstream.

Fix is :

elif (key == "MATCH_EXP_JOB_GLIDEIN_MaxMemMBs") and (value == "GLIDEIN_MaxMemMBs"):
    # FIXME after SI/WMA/CRAB teams solve this upstream. This key should be convertible to int
    continue

Additionally, convert_to_json.py is refactored for PEP standards.

mrceyhun commented 1 year ago

Where is the original problem? in dmwm code? Is there a GH issue that we can track to see when this gets fixed? If not, can you pls open one?

Actually I don't know where is the original problem. Our problem is that it blows up our logs with schema warning. We get job results from schedds with MATCH_EXP_JOB_GLIDEIN_MaxMemMBs = "GLIDEIN_MaxMemMBs"; which should not be. Because MATCH_EXP_JOB_GLIDEIN_MaxMemMBs should be an integer or a string/expr that can be convertible to integer.

I asked this question here in MM, SI team will check it. Till they find the issue(may be it is normal), I put this fix which ignore warning log for matching key-value pair.

belforte commented 1 year ago

It is good to solve this from htcondor side, but maybe monitoring should be able to accept odd values since they do not affect job scheduling and can't be a priority for SI people, especially if it only happens rarely ? In MM you said that this has been there since ever, IIUC.

mrceyhun commented 1 year ago

It is good to solve this from htcondor side, but maybe monitoring should be able to accept odd values since they do not affect job scheduling and can't be a priority for SI people, especially if it only happens rarely ? In MM you said that this has been there since ever, IIUC.

Yes @belforte , this is not affecting monitoring values, just producing schema warning for us which is not a problem for ops. I raised this in MM to check if I am on the right path and heads up for the ops. After Nikos approved that this should not happen(value should not be string), I thought we can also ignore these warnings.