Netflix / metaflow

Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems
https://metaflow.org
Apache License 2.0
8.07k stars 754 forks source link

Launching ipdb (or pdb) in steps #89

Open otaviog opened 4 years ago

otaviog commented 4 years ago

Hello,

Is it possible to use ipdb (or pdb) inside steps? On my code, it hangs without showing anything.

I have something like:

class MyTrain(FlowSpec):

    @step
    def start(self):
        print("Start ipdb")
        __import__("pdb").set_trace()

        self.next(self.load_dataset)

And the final output is:

$ python train.py run
...
2020-01-08 16:45:59.297 [1578512759283042/start/1 (pid 22301)] Task is starting.
2020-01-08 16:46:00.773 [1578512759283042/start/1 (pid 22301)] > /home/user521/metaflow-pytorch-example/train.py(97)start()

Thanks

romain-intel commented 4 years ago

Currently, this is not easily supported. The issue is that each step actually runs in a separate process so the import in the start step doesn't actually propagate to the other steps.

yariv commented 4 years ago

Could metaflow be configured to run all the tasks serially in the parent process? It could make debugging much easier with the help of pdb/ipdb.

romain-intel commented 4 years ago

I am looking at a way of making this work at least locally maybe using existing tools.

multimeric commented 4 years ago

This would be super helpful. In Toil (another workflow engine), they implemented a "no forking" mode which made the workflow run in the same interpreter so that PDB could be used: https://github.com/DataBiosphere/toil/pull/1910

thesillystudent commented 4 years ago

I have implemented a sequential runtime for metaflow. It's in the fork of this repo. I needed to use pdb as well as reuse some components of metaflow for Gcloud functions. YOu can check it out.
https://github.com/thesillystudent/metaflow/tree/functions_runtime This branch also has the support for split-or steps.

yariv commented 4 years ago

This is great! Any chance it can be merged to master?

On Sun, Feb 23, 2020 at 10:24 PM Maroof Ahmad notifications@github.com wrote:

I have implemented a sequential runtime for metaflow. It's in the fork of this repo. I needed to use pdb as well as reuse some components of metaflow for Gcloud functions. YOu can check it out. https://github.com/thesillystudent/metaflow/tree/functions_runtime This branch also has the support for split-or steps.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Netflix/metaflow/issues/89?email_source=notifications&email_token=AAAC6T5ODX66THWPGG5VHX3RENRY7A5CNFSM4KENWUA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWXQFQ#issuecomment-590182422, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAC6T6GXSV4AKZTESHNNB3RENRY7ANCNFSM4KENWUAQ .

romain-intel commented 4 years ago

I am in the process of updating the docs to detail how to use PyCharm or VSCode to step through step code in the way suggested here.

On a related note, you can get a "sequential" runtime for Metaflow by specifying --max-workers 1 in the command line. This will restrict Metaflow to executing only one task at a time.

multimeric commented 4 years ago

Does --max-workers 1 prevent any forking, such that you can use a debugger? Or does that just mean that it forks only once (which doesn't really help)?

romain-intel commented 4 years ago

@TMiguelT: yes, after I typed that comment I realized that in light of debugging it doesn't help at all; it will still fork a new process for every task just only allow one fork at a time... so indeed pretty useless for debugging. I should have the doc updated shortly though so hopefully that will help.

romain-intel commented 4 years ago

I added a section in the docs (https://docs.metaflow.org/metaflow/debugging#debugging-your-flow-code) that hopefully will help with this. If this is sufficient, I will close this issue, if you would like more support/information, please chime in again :).

multimeric commented 4 years ago

Oh, does PyCharm let you debug across multiple threads? I had no idea. Not OP but this looks like it solves this issue from my perspective. Great work documenting this!

otaviog commented 4 years ago

@romain-intel nice documentation about how to debug with VS Code and PyCharm. That is working for me. Thanks.

Although, I hope that in the long run this can be supported (maybe it's an issue from (i)pdb?). The team that I work likes ipdb to prototype PyTorch experiments since it is possible to print tensors and evaluate expressions more comfortably than with VS Code or PyCharm. Metaflow+ipdb could make prototyping even more agile.

romain-intel commented 4 years ago

Yes, both PyCharm and VSCode are actually pretty good because they allow you to debug the master process as well as all subprocesses. There is also probably a way to get it to work with remote debugging but for now, hopefully local debugging is sufficient to debug any issue.

yariv commented 4 years ago

I use vim and debug python from the command line using ipdb so these instructions won't work for me, unfortunately. If I could submit a feature request it would be to make it so that --max-workers=1 prevents forking child processes and serializes the execution in the main process.

ifokeev commented 4 years ago

use https://pypi.org/project/web-pdb/, it works fine

xguse commented 2 years ago

Any progress here? Not being able to run a debugger is not a minor annoyance. Its a pretty major adoption blocker in my opinion.

Also IMHO, the docs document (https://docs.metaflow.org/metaflow/debugging#debugging-your-flow-code) is NOT actual debugging any more than slapping a bunch of print statements into code is a substitute for a debugger. Being able to move up and down the call stack and being able to evaluate local variables is a MUCH more productive solution.

addisonklinke commented 8 months ago

Agreed with @xguse print() is not an actual debugger. It would be great for Metaflow to have a serial run option

@romain-intel Yes, both PyCharm and VSCode are actually pretty good because they allow you to debug the master process as well as all subprocesses

As mentioned in several threads, PyCharm has a "Attach to subprocess automatically while debugging" setting. However I have this enabled and the debugger still doesn't pick up breakpoints inside a Metaflow step. I believe the setting only applies to Popen.subprocess() and not whatever forking Metaflow uses

I even tried pausing a step so I could use PyCharm's attach to process option, but this fails