DistributedScience / Distributed-CellProfiler

Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.
https://distributedscience.github.io/Distributed-CellProfiler/
Other
37 stars 24 forks source link

Print easier to read memory used metrics? #158

Closed ErinWeisbart closed 1 month ago

ErinWeisbart commented 11 months ago

If your dockers run out of memory jobs fail silently. It's annoying. Our per-instance logs do regularly print instance metrics that include memory in use and memory available metrics. However, parsing them is annoying.

It would be nice if we could add in a regular print statement into the logs that is human readable and reports memory metrics so that one could more easily determine if memory issues are bonking jobs by browsing logs. Perhaps also include WARNING in the statement if it's above a certain threshold so that a CloudWatch dashboard widget could easily report it?

bethac07 commented 11 months ago

The current metric addition was definitely quick and dirty, so definitely could come up with something better, I'm sure it's just a matter of googling the right SO posts.

A warning though about memory - hopefully, most of the time memory issues aren't misconfiguration issues, but when they ARE, our current workflow can't detect it, so let's be thoughtful about how we do/don't describe amount of "available" memory (link below (Broad only))

https://broadinstitute.slack.com/archives/C3QFX04P7/p1642185636020900?thread_ts=1642185636.020900&cid=C3QFX04P7

bethac07 commented 11 months ago

We could also explore whether we want to do the actual agent installation as part of DCP - I doubt it, but if it's optional maybe not a terrible idea