HubSpot / Singularity

Scheduler (HTTP API and webapp) for running Mesos tasks—long running processes, one-off tasks, and scheduled jobs. #hubspot-open-source
http://getsingularity.com/
Apache License 2.0
823 stars 188 forks source link

HC timeout and job launch #2298

Closed yuriy-filatov closed 2 years ago

yuriy-filatov commented 2 years ago

Hey. We've currently facing an 'startup' issue for a bunch of our heavy jobs: Failed to launch container: discarded; Abnormal executor termination: unknown container. Docker container is something around 6Gbs and is just too big for a quick pull, so I've decided to bump initial startup timeout. According to the docs the proper key is deployHealthTimeoutSeconds so I've added

singularityDeploy:
  deployHealthTimeoutSeconds: 600

but new the new deploy still fails after 120 seconds. After that I've added similar config key to s9y server

deployHealthyBySeconds: 600

and restarted it -- but no luck. Can you please clarify how to bump the initial timeout? We're using 1.2.0 in Prod and 1.5.0 in QA.

yuriy-filatov commented 2 years ago

Now I get it

root@ip-172-30-172-252:/etc/mesos-slave# cat executor_registration_timeout
2mins

my bad, sorry and thank you :)