Ptychography-4-0 / ptychography

Code repository for Ptychography 4.0 project.
https://ptychography-4-0.github.io/ptychography/
GNU General Public License v3.0
27 stars 14 forks source link

Document `if __name__ == "__main__": ...` in SSB example (was: Error in SSB example) #47

Closed w-markus closed 3 years ago

w-markus commented 3 years ago

when starting the SSB example, upon creating the context:

ctx = lt.Context()

I get numerous copies of:

Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/asyncio/tasks.py:623> exception=RuntimeError('\n        An attempt has been made to start a new process before the\n        current process has finished its bootstrapping phase.\n\n        This probably means that you are not using fork to start your\n        child processes and you have forgotten to use the proper idiom\n        in the main module:\n\n            if __name__ == \'__main__\':\n                freeze_support()\n                ...\n\n        The "freeze_support()" line can be omitted if the program\n        is not going to be frozen to produce an executable.')>
Traceback (most recent call last):
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/asyncio/tasks.py", line 630, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/site-packages/distributed/core.py", line 285, in _
    await self.start()
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/site-packages/distributed/nanny.py", line 298, in start
    response = await self.instantiate()
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/site-packages/distributed/nanny.py", line 381, in instantiate
    result = await self.process.start()
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/site-packages/distributed/nanny.py", line 578, in start
    await self.process.start()
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/site-packages/distributed/process.py", line 33, in _call_and_set_future
    res = func(*args, **kwargs)
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/site-packages/distributed/process.py", line 203, in _start
    process.start()
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "/home/cri/Software/conda/miniconda/miniconda3/envs/ppp4_py37/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Software runs on a HPE ProLiant DL385 Gen10, 2x Epyc 7F72, 512 GB RAM under Debian Linux, testing distribution.

sk1p commented 3 years ago

Thanks for the report! Does this happen when running a "plain" Python script, or from some kind of REPL (ipython or similar)? When using a plain Python script, you need to guard the Context creation, like this:

import libertem.api as lt

if __name__ == "__main__":
    with lt.Context() as ctx:
        ds = ctx.load("...")  # etc.

I suspect this is what's happening here. See also the basic example in the LiberTEM docs

w-markus commented 3 years ago

Yes, correctly guessed, I was running a "pure" python script from within vscode. And yes, the suggested guard did the trick, whereas both lines are necessary, ``ìf name ...as well aswith ...```.

Thanks a lot!

Perhaps we leave this issue open until I have been able to run the full example?

sk1p commented 3 years ago

Perhaps we leave this issue open until I have been able to run the full example?

Sure, sounds good! Maybe we should include a pointer to the documentation in our notebooks, too.

w-markus commented 3 years ago

Am Freitag, dem 30.04.2021 um 07:11 -0700 schrieb Alexander Clausen:

Perhaps we leave this issue open until I have been able to run the full example? Sure, sounds good! Maybe we should include a pointer to the documentation in our notebooks, too. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. oh yes, that's something for the documentation. :-)

Do these two line make problems, when included in a notebook run script?

sk1p commented 3 years ago

Do these two line make problems, when included in a notebook run script?

Yes, because you can't really have a with-statement that wraps all the notebook cells. The with statement could be replaced with a ctx.close() at the end, but that is also inconvenient for users that just "run all cells" and want to keep experimenting in the notebook afterwards.

Wrapping all cells into an if __name__ == "__main__" has similar problems. I think it would be best to include a non-executing code snippet in a markdown cell, like the one in my comment above.

w-markus commented 3 years ago

Am Freitag, dem 30.04.2021 um 07:27 -0700 schrieb Alexander Clausen:

Do these two line make problems, when included in a notebook run script? Yes, because you can't really have a with-statement that wraps all the notebook cells. The with statement could be replaced with a ctx.close() at the end, but that is also inconvenient for users that just "run all cells" and want to keep experimenting in the notebook afterwards. Wrapping all cells into an if name == "main" has similar problems. I think it would be best to include a non-executing code snippet in a markdown cell, like the one in my comment above. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

oh, yes of course! This reminds me of: -- of topic -- of topic -- of topic -- of topic -- of topic --

https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1 ;-) -- of topic -- of topic -- of topic -- of topic -- of topic -- of topic

uellue commented 3 years ago

@w-markus yes, using notebooks requires some discipline. The slides show some nice "don'ts". Before sharing a notebook I always restart kernel, run all cells, save and shut down. That pretty much avoids the problem.

In particular for large data analysis they are pretty great, they are my preferred prototyping method. Importing everything, starting up a cluster, warming up the workers etc. takes its time. The full notebook also often has a few "number crunching" steps, for example first sum analysis, then COM analysis, then trotter generation, then ptychography. If I want to quickly benchmark some code changes in the ptycho routine, it is just great to

%autoreload
udf = SSB_UDF(...)

%time res = ctx.run_udf(...)

or change a bit of code in the UDF definition in a cell above and just run it, without going through the entire code that leads up to it. The same goes for a quick %lprun ... to see where that code spends its time etc. And we get our examples with figures and all embedded in our documentation, and they are at the same time runnable!

TL; DR

Problem solved!