Open StevenCTimm opened 3 years ago
I have been able to reproduce the failure. Investigating options.
The problem is understood. I am exploring options--one of which is described here: https://stackoverflow.com/questions/66890900/replacing-python-call-ast-node-with-try-node
See PR #322.
PR looks promising do you have a rpm build artifact of it? Could dig it out myself but appreciate if you could save me some time. Steve
This should be the RPM: https://github.com/HEPCloud/decisionengine/suites/2393932846/artifacts/50894008
The patch does not seem to be working as designed. Installed on fermicloud155, verified the patch is there.
2021-03-29T10:23:36-0500 - root - BooleanExpression - 4945 - MainThread - DEBUG - calling NamedFact::evaluate() 2021-03-29T10:23:36-0500 - root - datablock - 4945 - MainThread - ERROR - Did not get key in datablock getitem 2021-03-29T10:23:36-0500 - root - datablock - 4945 - MainThread - ERROR - No Key in datablock getitem
All the logic engine BooleanExpressions that are wrapped by fail_on_error now fail in this way
For reference:
"awswithininstburnrate": "fail_on_error( financial_params.iloc[0].target_aws_vm_burn_rate>AWS_Burn_Rate.iloc[0].BurnRate)",
"awswithinbillburnrate": "fail_on_error( financial_params.iloc[0].target_aws_bill_burn_rate>AWS_Billing_Rate[AWS_Billing_Rate['accountName']=='Fermilab'].iloc[0].costRatePerHourInLastSixHours )",
"awsabovebalance": "fail_on_error( financial_params.iloc[0].target_aws_balance<AWS_Billing_Info[AWS_Billing_Info['AccountName']=='Fermilab'].iloc[0].Balance)",
"gcewithininstburnrate": "fail_on_error( financial_params.iloc[0].target_gce_vm_burn_rate>GCE_Burn_Rate.iloc[0].BurnRate)",
"gceabovebalance": "fail_on_error( financial_params.iloc[0].target_gce_balance<GCE_Billing_Info.iloc[0].Balance)",
"fifenerscbelowlimit": "fail_on_error( Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='fife'].iloc[0].usedAlloc<Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='fife'].iloc[0].currentAlloc)",
"uscmsnerscbelowlimit": "fail_on_error( Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='uscms'].iloc[0].usedAlloc<Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='uscms'].iloc[0].currentAlloc)"
I have verified that all of the quantities thus referenced in the above configuration file do exist so none of these statements should be in an error condition at the moment.
Steve, I'm not positive that the logic-engine is to blame here--at least, not the fail_on_error
component of it. I suspect the exception is actually being raised in a (e.g.) downstream publisher. I wouldn't know for sure, however, until I do some more debugging. Any issues with me playing around with the decisionengine
on fermicloud155?
go ahead, play with it.. note that the current configuration takes a while to start up, 10 minutes or so,.
all the goods are in /var/log/decisionengine/resource_request.log
Yep, it's my fault. Modified the datablock code to print the missing key on fermicloud155:
2021-04-01T14:29:44-0500 - root - datablock - 15503 - MainThread - ERROR - Did not get key 'fail_on_error' in datablock __getitem__
Will look for a solution.
Followup--I believe that Kyle did indeed fix this issue but I have not yet verified it.. will attempt to add the fail_on_error logic back to a 1.7 configuration to be sure it works.
@StevenCTimm, is this issue resolved?
I think so but I haven't had a chance to test it again recently.
Steve
From: Kyle Knoepfel @.> Sent: Monday, April 4, 2022 9:42 AM To: HEPCloud/decisionengine @.> Cc: Steven C Timm @.>; Mention @.> Subject: Re: [HEPCloud/decisionengine] fail_on_error clause makes logic engine facts False that would otherwise be true. (#318)
@StevenCTimmhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_StevenCTimm&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=94v3CwqD6PjwH6hs536V6-41vfRa5fa2qu8AeEnIFJIhq7bgMaFcKV5EqRZDiyV5&s=zibKahEUXlus3s8wZCxFFjAUUVbIKqO4Ym8sSLymb3A&e=, is this issue resolved?
— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_HEPCloud_decisionengine_issues_318-23issuecomment-2D1087645945&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=94v3CwqD6PjwH6hs536V6-41vfRa5fa2qu8AeEnIFJIhq7bgMaFcKV5EqRZDiyV5&s=HZcZ_CN1rCc40ZQspLkvW1U9mMCWsXKVvmlVpZCTbAU&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGG4SOEJZHCSMDCIB2RUM3TVDL5V5ANCNFSM4ZZUF2ZA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=94v3CwqD6PjwH6hs536V6-41vfRa5fa2qu8AeEnIFJIhq7bgMaFcKV5EqRZDiyV5&s=ujBQrlmAP9Wa0I5KflIxigEGMJP9XWC0iawEXBq0Nhw&e=. You are receiving this because you were mentioned.Message ID: @.***>
I had previously tested that the fail_on_error clause successfully made facts "false" that would otherwise be "error". But I am now observing is that some facts that would otherwise be True end up False when wrapped by fail_on_error.
Initially in resource_request.jsonnet I had the following:
For record, financial_params.iloc[0].target_gce_balance is -20000 GCE_Billing_Info.iloc.[0].Balance is -507 GCE_Burn_Rate.iloc.[0].BurnRate is 0.01 financial_params.iloc.[0].target_gce_vm_burn_rate is 9
Both of these facts wrapped by the fail_on_error evaluated to False.
I removed the fail_on_error wrapper and they evaluated to True as they should.