Closed ehb54 closed 2 years ago
Update:
The job completed normally and produced full output after about 11 minutes and apparently never was canceled on anvil
.
The terminate_airavata_experiment
call to Airavata was made within a minute after it went to ACTIVE
.
The files are in /anvil/projects/x-mcb070039n/us3/PROCESS_46cf1450-1409-4949-93b1-a57e1a4a4b3b
Started & canceled another job to anvil
US3-AIRA_d94c35cb-5348-4de0-bd02-52d694b16366
Hopefully this one takes longer to complete.
Update:
This appears to be an anvil
specific issue. Terminating jobs on expanse
does cancel them from the slurm queue.
Update:
Issue also present on lonestar6
, canceled jobs remain running.
@DImuthuUpe @eroma2014 can you please see if Anvil and Lonestar6 emails are configured correctly?
Any update on this?
Checking it now.
@Emre Brookes @.***> Could you please do a test cancelation on Anvil ?
I added the anvil email to job status monitoring. Ran a test job and the cancelation was successful. We received the canel email notification.
Thanks, Eroma
On Tue, Oct 11, 2022 at 8:00 AM ehb54 @.***> wrote:
Any update on this?
— Reply to this email directly, view it on GitHub https://github.com/SciGaP/ultrascan-airavata-bridge/issues/12#issuecomment-1274571258, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB32L2ZXQHT27R3TAMDQZY3WCVJHNANCNFSM6AAAAAAQUD3IEI . You are receiving this because you were mentioned.Message ID: @.***>
-- Thank You, Best Regards, Eroma
@eroma2014 Yes it properly deleted from the slurm queue on Anvil :) Thanks. Now, how about lonestar6?
Thanks, Emre @ehb54
@Emre @.***> Please try LS6 now.
Thanks, Eroma
From: ehb54 @.> Reply-To: SciGaP/ultrascan-airavata-bridge @.> Date: Tuesday, October 11, 2022 at 11:24 AM To: SciGaP/ultrascan-airavata-bridge @.> Cc: "Abeysinghe, Eroma" @.>, Mention @.***> Subject: [External] Re: [SciGaP/ultrascan-airavata-bridge] Canceling jobs issue (Issue #12)
You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.
@eroma2014https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feroma2014&data=05%7C01%7Ceabeysin%40iu.edu%7Cd50c9faf8cb549122a5d08daab9cae4d%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638010986671750333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lBdNImfV6GnRPXKIq%2FWMGzXbwZ2HGBWjVMHsjWp%2F2oU%3D&reserved=0 Yes it properly deleted from the slurm queue on Anvil :) Thanks. Now, how about lonestar6?
— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSciGaP%2Fultrascan-airavata-bridge%2Fissues%2F12%23issuecomment-1274876873&data=05%7C01%7Ceabeysin%40iu.edu%7Cd50c9faf8cb549122a5d08daab9cae4d%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638010986671750333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=osJ%2FblGJSppGUU7WGUZJ4RChgk8sdtVNQ2EEnH4nzXQ%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB32L24WY5ZDF6AX3PZDZA3WCWBBPANCNFSM6AAAAAAQUD3IEI&data=05%7C01%7Ceabeysin%40iu.edu%7Cd50c9faf8cb549122a5d08daab9cae4d%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638010986671750333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZIDSZXxuROHRBTIeVowtK8uA6LQQy8ge%2FFa74EV3v7I%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
@eroma2014 Yes, ls6 cancels from the machine queue now also :) Thanks for your help!
I'm working on the UltraScan LIMS cancel logic & there seems to be an issue with
It runs and
Returns first
CANCELING
and thenCANCELED
, but the slurm job appears to remain running on the resource.I have only yet tested
anvil
, so not sure if this is cluster-specific or is happening on all clusters. Current test $expIdUS3-AIRA_dfc21306-338b-4468-80eb-b67852179d1a
Status isCANCELED
, but onanvil