We want to be able to have a new job queue that that doesn't use AWS Spot instances and understand the pricing estimates. We also want to increase the default timeout past 24 hours. Steps needed to get there
AC
[ ] Scalene and s3fs upgrades seemed to be crux of the stalling on 3.1.4 images. Confirm if that's the problem on 3.1.5 images
[ ] Why is scalene causing these issues? Is there something better we can use?
[ ] We do not seem to be getting accurate scalene memory profiling output? fix this
[ ] Figure out what log analysis can show us
[ ] Write a DPS ticket with these requirements for new queue:
Goal
We want to be able to have a new job queue that that doesn't use AWS Spot instances and understand the pricing estimates. We also want to increase the default timeout past 24 hours. Steps needed to get there
AC
scalene
memory profiling output? fix this