flux-framework / flux-accounting

bank/accounting interface for the Flux resource manager
https://flux-framework.readthedocs.io/projects/flux-accounting/en/latest/index.html
GNU Lesser General Public License v3.0
3 stars 10 forks source link

Exception: "failed to update jobspec with bank name" on Tuolumne #498

Open jameshcorbett opened 1 month ago

jameshcorbett commented 1 month ago
[devcich1@tuolumne1001:MY_TEST_DIR]$ flux job info fBKZ2XwhCX9 eventlog | grep -i fail
{"timestamp":1728333451.1065736,"name":"exception","context":{"type":"mf_priority","severity":0,"note":"failed to update jobspec with bank name","userid":767}}
jameshcorbett commented 1 month ago

They seem to be happening regularly:

flux job info fBKZNkS3D2P eventlog
{"timestamp":1728333175.2028327,"name":"submit","context":{"userid":54987,"urgency":16,"flags":0,"version":1}}
{"timestamp":1728333175.2335021,"name":"validate"}
{"timestamp":1728333175.2560935,"name":"dependency-add","context":{"description":"dws-create"}}
{"timestamp":1728333175.8794999,"name":"memo","context":{"rabbit_workflow":"fluxjob-76653206309962752"}}
{"timestamp":1728333179.8968751,"name":"dependency-remove","context":{"description":"dws-create"}}
{"timestamp":1728333179.8969295,"name":"depend"}
{"timestamp":1728333179.897059,"name":"priority","context":{"priority":16}}
{"timestamp":1728333317.7042291,"name":"alloc","context":{"annotations":{"user":{"rabbit_workflow":"fluxjob-76653206309962752"}}}}
{"timestamp":1728333317.7044308,"name":"prolog-start","context":{"description":"job-manager.prolog"}}
{"timestamp":1728333317.704457,"name":"prolog-start","context":{"description":"cray-pals-port-distributor"}}
{"timestamp":1728333317.7044675,"name":"prolog-start","context":{"description":"dws-setup"}}
{"timestamp":1728333317.7085178,"name":"prolog-finish","context":{"description":"cray-pals-port-distributor","status":0}}
{"timestamp":1728333317.7138379,"name":"memo","context":{"rabbits":"tuolumne267"}}
{"timestamp":1728333399.8657088,"name":"dws_environment","context":{"variables":{"DW_JOB_ioioio":"/mnt/nnf/f52b9826-5db6-40c4-9f18-272437d6f807-0","DW_WORKFLOW_NAME":"fluxjob-76653206309962752","DW_WORKFLOW_NAMESPACE":"default"},"rabbits":{"tuolumne267":"tuolumne[2057-2072]"},"copy_offload":false}}
{"timestamp":1728333399.8658042,"name":"prolog-finish","context":{"description":"dws-setup","status":0}}
{"timestamp":1728333400.2696013,"name":"prolog-finish","context":{"description":"job-manager.prolog","status":0}}
{"timestamp":1728333400.310169,"name":"start"}
{"timestamp":1728333400.7326884,"name":"finish","context":{"status":0}}
{"timestamp":1728333400.7330658,"name":"epilog-start","context":{"description":"job-manager.epilog"}}
{"timestamp":1728333400.7331221,"name":"epilog-start","context":{"description":"dws-epilog"}}
{"timestamp":1728333400.7672787,"name":"release","context":{"ranks":"all","final":true}}
{"timestamp":1728333400.925935,"name":"epilog-finish","context":{"description":"job-manager.epilog","status":0}}
{"timestamp":1728333451.1015525,"name":"exception","context":{"type":"mf_priority","severity":0,"note":"failed to update jobspec with bank name","userid":767}}
{"timestamp":1728333451.1015964,"name":"jobspec-update","context":{"attributes.system.bank":"DNE"}}
{"timestamp":1728333451.1016669,"name":"exception","context":{"type":"mf_priority","severity":0,"note":"job.update: bank info is missing","userid":767}}
{"timestamp":1728333454.9056153,"name":"epilog-finish","context":{"description":"dws-epilog","status":0}}
{"timestamp":1728333454.9060705,"name":"free"}
{"timestamp":1728333454.9061046,"name":"clean"}
jameshcorbett commented 1 month ago

Sounds like this has been resolved offline @cmoussa1 ? In which case feel free to close.

grondo commented 1 month ago

We should probably open a separate issue (?) on the mf_priority plugin trying to update the jobspec for jobs that are past the SCHED state.

cmoussa1 commented 1 month ago

Yup, I was planning on seeing if I could reproduce this behavior in a controlled environment today, so I'll leave this one open (or can just open a separate issue).