SciGaP / ultrascan-airavata-bridge

Glue layer combining Ultrascan and Airavata
Apache License 2.0
0 stars 5 forks source link

Canceling jobs issue #12

Closed ehb54 closed 2 years ago

ehb54 commented 2 years ago

I'm working on the UltraScan LIMS cancel logic & there seems to be an issue with

$airavataWrapper->terminate_airavata_experiment($expId);

It runs and

$airavataWrapper->get_experiment_status($expId);

Returns first CANCELING and then CANCELED, but the slurm job appears to remain running on the resource.

I have only yet tested anvil, so not sure if this is cluster-specific or is happening on all clusters. Current test $expId US3-AIRA_dfc21306-338b-4468-80eb-b67852179d1a Status is CANCELED, but on anvil

$ squeue -u x-us3
JOBID        USER           ACCOUNT       NAME             NODES   CPUS  TIME_LIMIT ST TIME
543452       x-us3          mcb070039n    A822159873           1     32    10:00:00  R 8:52
ehb54 commented 2 years ago

Update: The job completed normally and produced full output after about 11 minutes and apparently never was canceled on anvil. The terminate_airavata_experiment call to Airavata was made within a minute after it went to ACTIVE. The files are in /anvil/projects/x-mcb070039n/us3/PROCESS_46cf1450-1409-4949-93b1-a57e1a4a4b3b

ehb54 commented 2 years ago

Started & canceled another job to anvil

US3-AIRA_d94c35cb-5348-4de0-bd02-52d694b16366

Hopefully this one takes longer to complete.

ehb54 commented 2 years ago

Update: This appears to be an anvil specific issue. Terminating jobs on expanse does cancel them from the slurm queue.

ehb54 commented 2 years ago

Update: Issue also present on lonestar6, canceled jobs remain running.

smarru commented 2 years ago

@DImuthuUpe @eroma2014 can you please see if Anvil and Lonestar6 emails are configured correctly?

ehb54 commented 2 years ago

Any update on this?

marpierc commented 2 years ago

Checking it now.

eroma2014 commented 2 years ago

@Emre Brookes @.***> Could you please do a test cancelation on Anvil ?

I added the anvil email to job status monitoring. Ran a test job and the cancelation was successful. We received the canel email notification.

Thanks, Eroma

On Tue, Oct 11, 2022 at 8:00 AM ehb54 @.***> wrote:

Any update on this?

— Reply to this email directly, view it on GitHub https://github.com/SciGaP/ultrascan-airavata-bridge/issues/12#issuecomment-1274571258, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB32L2ZXQHT27R3TAMDQZY3WCVJHNANCNFSM6AAAAAAQUD3IEI . You are receiving this because you were mentioned.Message ID: @.***>

-- Thank You, Best Regards, Eroma

ehb54 commented 2 years ago

@eroma2014 Yes it properly deleted from the slurm queue on Anvil :) Thanks. Now, how about lonestar6?

Thanks, Emre @ehb54

eroma2014 commented 2 years ago

@Emre @.***> Please try LS6 now.

Thanks, Eroma

From: ehb54 @.> Reply-To: SciGaP/ultrascan-airavata-bridge @.> Date: Tuesday, October 11, 2022 at 11:24 AM To: SciGaP/ultrascan-airavata-bridge @.> Cc: "Abeysinghe, Eroma" @.>, Mention @.***> Subject: [External] Re: [SciGaP/ultrascan-airavata-bridge] Canceling jobs issue (Issue #12)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.

@eroma2014https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feroma2014&data=05%7C01%7Ceabeysin%40iu.edu%7Cd50c9faf8cb549122a5d08daab9cae4d%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638010986671750333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lBdNImfV6GnRPXKIq%2FWMGzXbwZ2HGBWjVMHsjWp%2F2oU%3D&reserved=0 Yes it properly deleted from the slurm queue on Anvil :) Thanks. Now, how about lonestar6?

Thanks, Emre @ehb54https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fehb54&data=05%7C01%7Ceabeysin%40iu.edu%7Cd50c9faf8cb549122a5d08daab9cae4d%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638010986671750333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WVp9JpuvqcYqNNQid%2FT9zujqb1bEXXUsJN0CJOUEnww%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSciGaP%2Fultrascan-airavata-bridge%2Fissues%2F12%23issuecomment-1274876873&data=05%7C01%7Ceabeysin%40iu.edu%7Cd50c9faf8cb549122a5d08daab9cae4d%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638010986671750333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=osJ%2FblGJSppGUU7WGUZJ4RChgk8sdtVNQ2EEnH4nzXQ%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB32L24WY5ZDF6AX3PZDZA3WCWBBPANCNFSM6AAAAAAQUD3IEI&data=05%7C01%7Ceabeysin%40iu.edu%7Cd50c9faf8cb549122a5d08daab9cae4d%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C638010986671750333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZIDSZXxuROHRBTIeVowtK8uA6LQQy8ge%2FFa74EV3v7I%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

ehb54 commented 2 years ago

@eroma2014 Yes, ls6 cancels from the machine queue now also :) Thanks for your help!