Netflix / metaflow

Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems
https://metaflow.org
Apache License 2.0
8.2k stars 772 forks source link

Prevent idle computer from sleeping while flow is running #2032

Open fumoboy007 opened 1 month ago

fumoboy007 commented 1 month ago

As far as I can tell, Metaflow does not prevent idle sleep, which means the computer may go to sleep while a long-running flow is still in progress. Instead, Metaflow can use one of various libraries to prevent idle sleep. For example, caffeine uses macOS’s system API to prevent idle sleep.

savingoyal commented 1 month ago

you can already do it today out of the box on mac os - just try executing -

caffeinate -i bash -c "python coreweave_flow.py run"

:)

fumoboy007 commented 1 month ago

@savingoyal Yes! I’m aware. However, the responsibility should be on the program to do it so that the user does not need to think about it. For example, if a web browser is currently downloading a file, the user expects idle sleep to be prevented automatically.

savingoyal commented 1 month ago

It's a good question if metaflow should interfere with the system settings of the host computer. I am unaware of any other program of a similar nature that hijacks the sleep settings. Regardless, there is always a way out with using solutions like caffeinate.

fumoboy007 commented 1 month ago

Hmm I think there is a misunderstanding here. Let me elaborate.

It's a good question if metaflow should interfere with the system settings of the host computer.

To be clear, we are not talking about changing the system settings. We are talking about using an official system API to tell the OS to temporarily prevent idle sleep during a long-running operation.

I am unaware of any other program of a similar nature that hijacks the sleep settings.

As I mentioned in https://github.com/Netflix/metaflow/issues/2032#issuecomment-2354546305, it is common practice for programs to temporarily prevent idle sleep during long-running operations. Examples:

Here are some example command-line programs from this GitHub search:

Regardless, there is always a way out with using solutions like caffeinate.

Yes, this is a good workaround. However, I created this issue because I think Metaflow should automatically prevent idle sleep while running a flow.

I imagine the following is a common scenario:

  1. A user starts running a flow to train a neural network.
  2. The flow takes a while. The user goes for a coffee break.
  3. The computer goes to sleep, preventing progress on the flow until the user comes back.

That’s why I think it would be a good default for Metaflow to prevent idle sleep while running a flow. Can you think of any scenarios where the user would not want idle sleep to be prevented while running a flow?

savingoyal commented 1 month ago

for long-running workflows there can be other sources of interruptions as well (the machine could crash for any arbitrary reason) - that's why we recommend deploying the flow to step-functions, airflow or argo-workflows to gain an additional factor of resiliency.

fumoboy007 commented 1 month ago

Sure, the computer or Metaflow could crash at any time and the user will have to deal with it. This matches the mental model of the user operating their computer—apps crash all the time, the kernel crashes from time to time, etc.

But that same mental model also prescribes that programs with important, long-running operations automatically prevent idle sleep. See the examples I already gave.

I’m not sure I understand your concerns about my suggestion. Is there any downside to preventing idle sleep when running a flow locally?

fumoboy007 commented 1 month ago

@fohrloop’s wakepy package looks robust and supports macOS, GNOME, KDE Plasma, Freedesktop.org DE, and Windows.

Usage:

from wakepy import keep

with keep.running(on_fail='warn'):
    # Run the flow here.

@savingoyal If you can point me to the code where the run/resume subcommands are executed, I can send a pull request.