aelzenaar / bella

New computational package for small-rank matrix groups
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Error running atom.py #19

Closed ariymarkowitz closed 10 months ago

ariymarkowitz commented 10 months ago
Traceback (most recent call last):
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/site-packages/dask/backends.py", line 136, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/site-packages/dask/dataframe/io/csv.py", line 760, in read
    return read_pandas(
           ^^^^^^^^^^^^
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/site-packages/dask/dataframe/io/csv.py", line 533, in read_pandas
    raise OSError(f"{urlpath} resolved to no files")
OSError: atom/*.csv resolved to no files

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/amar630/Downloads/bella-main/examples/atom.py", line 70, in <module>
    df = dd.read_csv("atom/*.csv")
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/site-packages/dask/backends.py", line 138, in wrapper
    raise type(e)(
OSError: An error occurred while calling the read_csv method registered to the pandas backend.

I tried uncommenting the commented code, but then I get an error that G is not defined.

I am running Python 3.11.

aelzenaar commented 10 months ago

There should be nothing commented, the code in the repo accidentally had some commented, I just pushed a fix

The flow is the following

Then,

So in theory you can run the first bit on a mainframe or whatever that doesn't support the rendering, and then just do the (comparatively easy) rendering on a local PC (it uses datashader now so no massive memory usage, plus using dask not pandas means that it just loads CSV files one at a time lazily instead of all at once)

I commented the first step out since I already have 2000 csv files and just wanted to render, didn't mean to commit in that state

aelzenaar commented 10 months ago

In general all of this work is not necessary, but since we have 10000 generators and not 2 or 3 we need to compute a lot more limit points (several orders of magnitude more), so this example is massively more hungry than anything else

ariymarkowitz commented 10 months ago

I'm still getting the following error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/amar630/Downloads/bella-main/examples/atom.py", line 50, in one_limit_set
    df = G.coloured_limit_set_fast(points_per_walk, seed=seed)
         ^
NameError: name 'G' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/amar630/Downloads/bella-main/examples/atom.py", line 69, in <module>
    _ = pool.starmap(one_limit_set, [[n+1, number_of_walks, points_per_walk, seed] for n in range(number_of_walks)], chunksize=1 )
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/multiprocessing/pool.py", line 375, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/amar630/anaconda3/envs/py3-11/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
NameError: name 'G' is not defined
aelzenaar commented 10 months ago

G should be defined on line 65, something like

G = AtomGroup(generators, 1.1)
aelzenaar commented 10 months ago

Oh I see, G is not being passed into one_limit_set... it works on my machine no idea why, let me think for a minute

aelzenaar commented 10 months ago

What is your OS? Is it linux or something else?

aelzenaar commented 10 months ago

On linux, multiprocessing uses fork by default so the child process gets every variable of the parent process (it's an exact copy). On windows and (I think) mac, it uses spawn by default so the child is a new interpreter and everything. I think this is why my subprocess sees G and yours does not.

aelzenaar commented 10 months ago

(Usually I set it to spawn explicitly but in this particular case I did not. Soon, I think in Python 3.14 or something, it will be the default anyway.)

aelzenaar commented 10 months ago

Even adding multiprocessing.set_start_method('spawn') I can't repro, but the commit which will show up in 2mins should fix the problem anyway by passing G explicitly. No idea why it works on my machine since the spawn method doesn't seem to be the problem....

aelzenaar commented 10 months ago

@ariymarkowitz I was missing a "global" directive, I am not sure if forking is available on Mac OSX, it is UNIX in theory so it might, but if it is then this will fix an error. There is also a more sophisticated algorithm for producing the generators that now give reflections in tangent circles not just circles which are almost tangent. Anyway I just pushed these changes.

ariymarkowitz commented 10 months ago

Looks like it's working now!