HEPCloud / decisionengine

HEPCloud Decision Engine framework
Apache License 2.0
6 stars 26 forks source link

fail_on_error clause makes logic engine facts False that would otherwise be true. #318

Open StevenCTimm opened 3 years ago

StevenCTimm commented 3 years ago

I had previously tested that the fail_on_error clause successfully made facts "false" that would otherwise be "error". But I am now observing is that some facts that would otherwise be True end up False when wrapped by fail_on_error.

Initially in resource_request.jsonnet I had the following:

      "gcewithininstburnrate": "fail_on_error(financial_params.iloc[0].target_gce_vm_burn_rate>GCE_Burn_Rate.iloc[0].BurnRate)",
      "gceabovebalance": "fail_on_error( financial_params.iloc[0].target_gce_balance<GCE_Billing_Info.iloc[0].Balance)",

For record, financial_params.iloc[0].target_gce_balance is -20000 GCE_Billing_Info.iloc.[0].Balance is -507 GCE_Burn_Rate.iloc.[0].BurnRate is 0.01 financial_params.iloc.[0].target_gce_vm_burn_rate is 9

Both of these facts wrapped by the fail_on_error evaluated to False.

I removed the fail_on_error wrapper and they evaluated to True as they should.

knoepfel commented 3 years ago

I have been able to reproduce the failure. Investigating options.

knoepfel commented 3 years ago

The problem is understood. I am exploring options--one of which is described here: https://stackoverflow.com/questions/66890900/replacing-python-call-ast-node-with-try-node

knoepfel commented 3 years ago

See PR #322.

StevenCTimm commented 3 years ago

PR looks promising do you have a rpm build artifact of it? Could dig it out myself but appreciate if you could save me some time. Steve

knoepfel commented 3 years ago

This should be the RPM: https://github.com/HEPCloud/decisionengine/suites/2393932846/artifacts/50894008

StevenCTimm commented 3 years ago

The patch does not seem to be working as designed. Installed on fermicloud155, verified the patch is there.

2021-03-29T10:23:36-0500 - root - BooleanExpression - 4945 - MainThread - DEBUG - calling NamedFact::evaluate() 2021-03-29T10:23:36-0500 - root - datablock - 4945 - MainThread - ERROR - Did not get key in datablock getitem 2021-03-29T10:23:36-0500 - root - datablock - 4945 - MainThread - ERROR - No Key in datablock getitem

All the logic engine BooleanExpressions that are wrapped by fail_on_error now fail in this way

For reference:

      "awswithininstburnrate": "fail_on_error( financial_params.iloc[0].target_aws_vm_burn_rate>AWS_Burn_Rate.iloc[0].BurnRate)",
      "awswithinbillburnrate": "fail_on_error( financial_params.iloc[0].target_aws_bill_burn_rate>AWS_Billing_Rate[AWS_Billing_Rate['accountName']=='Fermilab'].iloc[0].costRatePerHourInLastSixHours )",
      "awsabovebalance": "fail_on_error( financial_params.iloc[0].target_aws_balance<AWS_Billing_Info[AWS_Billing_Info['AccountName']=='Fermilab'].iloc[0].Balance)",
      "gcewithininstburnrate": "fail_on_error( financial_params.iloc[0].target_gce_vm_burn_rate>GCE_Burn_Rate.iloc[0].BurnRate)",
      "gceabovebalance": "fail_on_error( financial_params.iloc[0].target_gce_balance<GCE_Billing_Info.iloc[0].Balance)",
      "fifenerscbelowlimit": "fail_on_error( Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='fife'].iloc[0].usedAlloc<Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='fife'].iloc[0].currentAlloc)",
      "uscmsnerscbelowlimit": "fail_on_error( Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='uscms'].iloc[0].usedAlloc<Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='uscms'].iloc[0].currentAlloc)"

I have verified that all of the quantities thus referenced in the above configuration file do exist so none of these statements should be in an error condition at the moment.

knoepfel commented 3 years ago

Steve, I'm not positive that the logic-engine is to blame here--at least, not the fail_on_error component of it. I suspect the exception is actually being raised in a (e.g.) downstream publisher. I wouldn't know for sure, however, until I do some more debugging. Any issues with me playing around with the decisionengine on fermicloud155?

StevenCTimm commented 3 years ago

go ahead, play with it.. note that the current configuration takes a while to start up, 10 minutes or so,.

StevenCTimm commented 3 years ago

all the goods are in /var/log/decisionengine/resource_request.log

knoepfel commented 3 years ago

Yep, it's my fault. Modified the datablock code to print the missing key on fermicloud155:

2021-04-01T14:29:44-0500 - root - datablock - 15503 - MainThread - ERROR - Did not get key 'fail_on_error' in datablock __getitem__

Will look for a solution.

StevenCTimm commented 3 years ago

Followup--I believe that Kyle did indeed fix this issue but I have not yet verified it.. will attempt to add the fail_on_error logic back to a 1.7 configuration to be sure it works.

knoepfel commented 2 years ago

@StevenCTimm, is this issue resolved?

StevenCTimm commented 2 years ago

I think so but I haven't had a chance to test it again recently.

Steve


From: Kyle Knoepfel @.> Sent: Monday, April 4, 2022 9:42 AM To: HEPCloud/decisionengine @.> Cc: Steven C Timm @.>; Mention @.> Subject: Re: [HEPCloud/decisionengine] fail_on_error clause makes logic engine facts False that would otherwise be true. (#318)

@StevenCTimmhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_StevenCTimm&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=94v3CwqD6PjwH6hs536V6-41vfRa5fa2qu8AeEnIFJIhq7bgMaFcKV5EqRZDiyV5&s=zibKahEUXlus3s8wZCxFFjAUUVbIKqO4Ym8sSLymb3A&e=, is this issue resolved?

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_HEPCloud_decisionengine_issues_318-23issuecomment-2D1087645945&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=94v3CwqD6PjwH6hs536V6-41vfRa5fa2qu8AeEnIFJIhq7bgMaFcKV5EqRZDiyV5&s=HZcZ_CN1rCc40ZQspLkvW1U9mMCWsXKVvmlVpZCTbAU&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGG4SOEJZHCSMDCIB2RUM3TVDL5V5ANCNFSM4ZZUF2ZA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=94v3CwqD6PjwH6hs536V6-41vfRa5fa2qu8AeEnIFJIhq7bgMaFcKV5EqRZDiyV5&s=ujBQrlmAP9Wa0I5KflIxigEGMJP9XWC0iawEXBq0Nhw&e=. You are receiving this because you were mentioned.Message ID: @.***>