cbyrohl / scida

scida is an out-of-the-box analysis tool for large scientific datasets. It primarily supports the astrophysics community, focusing on cosmological and galaxy formation simulations using particles or unstructured meshes, as well as large observational datasets. This tool uses dask, allowing analysis to scale.
https://scida.io
MIT License
26 stars 4 forks source link

Encountering Error in Halo Radial Profile Code - 'numpy.ndarray' object has no attribute '_meta' #62

Closed astronerdF closed 1 year ago

astronerdF commented 1 year ago

Hi! I tried to run the code for getting the Radial profile for each halo on the wesbite https://cbyrohl.github.io/scida/halocatalogs/, and ran into some errors. Could you please check? Thanks

import numpy as np
from scipy.stats import binned_statistic

basePath = "/virgotng/universe/IllustrisTNG/TNG100-3/output"

ds = load(basePath+"/snapdir_099", units=True)

gas = ds.data["PartType0"]
vol = gas["Masses"] / gas["Density"]
grp = ds.data["Group"]
pos3 = gas["Coordinates"] - grp["GroupPos"][gas["GroupID"]]
dist = da.sqrt(da.sum((pos3)**2, axis=1)) 

def customfunc(dist, density, volume):
    a = binned_statistic(dist, density, statistic="sum", bins=np.linspace(0, 200, 10))[0]
    b = binned_statistic(dist, volume, statistic="sum", bins=np.linspace(0, 200, 10))[0]
    return a/b

g = ds.grouped(dict(dist=dist, Density=gas["Density"],
                    Volume=vol))
s = g.apply(customfunc).evaluate()

The error that I am getting is:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[11], line 11
      9 vol = gas["Masses"] [/](https://vscode-remote+ssh-002dremote-002bvera2.vscode-resource.vscode-cdn.net/) gas["Density"]
     10 grp = ds.data["Group"]
---> 11 pos3 = gas["Coordinates"] - grp["GroupPos"][gas["GroupID"]]
     12 dist = da.sqrt(da.sum((pos3)**2, axis=1)) 
     14 def customfunc(dist, density, volume):

File [~/conda-envs/myenv/lib/python3.10/site-packages/pint/facets/numpy/quantity.py:238](https://vscode-remote+ssh-002dremote-002bvera2.vscode-resource.vscode-cdn.net/u/fahma/Test/Metallicity/~/conda-envs/myenv/lib/python3.10/site-packages/pint/facets/numpy/quantity.py:238), in NumpyQuantity.__getitem__(self, key)
    236 def __getitem__(self, key):
    237     try:
--> 238         return type(self)(self._magnitude[key], self._units)
    239     except PintTypeError:
    240         raise

File [~/conda-envs/myenv/lib/python3.10/site-packages/dask/array/core.py:1993](https://vscode-remote+ssh-002dremote-002bvera2.vscode-resource.vscode-cdn.net/u/fahma/Test/Metallicity/~/conda-envs/myenv/lib/python3.10/site-packages/dask/array/core.py:1993), in Array.__getitem__(self, index)
   1990     return self
   1992 out = "getitem-" + tokenize(self, index2)
-> 1993 dsk, chunks = slice_array(out, self.name, self.chunks, index2, self.itemsize)
   1995 graph = HighLevelGraph.from_collections(out, dsk, dependencies=[self])
   1997 meta = meta_from_array(self._meta, ndim=len(chunks))

File [~/conda-envs/myenv/lib/python3.10/site-packages/dask/array/slicing.py:176](https://vscode-remote+ssh-002dremote-002bvera2.vscode-resource.vscode-cdn.net/u/fahma/Test/Metallicity/~/conda-envs/myenv/lib/python3.10/site-packages/dask/array/slicing.py:176), in slice_array(out_name, in_name, blockdims, index, itemsize)
    173 index += (slice(None, None, None),) * missing
...
    825     adjust_chunks={0: 1},  # one row for each block in a
    826 )
    828 # add offsets to take account of the position of each block within the array a

AttributeError: 'numpy.ndarray' object has no attribute '_meta'
cbyrohl commented 1 year ago

Sorry about that. Let me fix that.

By the way, both #59 and this issue (as potentially many others) have to do with some issues in the unit support. As a temporary workaround, you can work without units (load(basePath+"/snapdir_099", units=False)).

astronerdF commented 1 year ago

Ah, okay. Sorry, the next issue I put up also has to do with the units. I'll keep this in mind. Thank you

astronerdF commented 1 year ago
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 23
     19     return a[/](https://vscode-remote+ssh-002dremote-002bvera2.vscode-resource.vscode-cdn.net/)b
     21 g = ds.grouped(dict(dist=dist, Density=gas["Density"],
     22                     Volume=vol))
---> 23 s = g.apply(customfunc).evaluate()

File [/vera/u/fahma/scida/src/scida/customs/arepo/dataset.py:576](https://vscode-remote+ssh-002dremote-002bvera2.vscode-resource.vscode-cdn.net/vera/u/fahma/scida/src/scida/customs/arepo/dataset.py:576), in GroupAwareOperation.evaluate(self, compute)
    574 res = map_halo_operation(func, self.lengths, self.arrs, fieldnames=fieldnames)
    575 if compute:
--> 576     res = res.compute()
    577 return res

File [~/conda-envs/myenv/lib/python3.10/site-packages/dask/threaded.py:89](https://vscode-remote+ssh-002dremote-002bvera2.vscode-resource.vscode-cdn.net/u/fahma/Test/Metallicity/~/conda-envs/myenv/lib/python3.10/site-packages/dask/threaded.py:89), in get(dsk, keys, cache, num_workers, pool, **kwargs)
     86     elif isinstance(pool, multiprocessing.pool.Pool):
     87         pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
     90     pool.submit,
     91     pool._max_workers,
     92     dsk,
     93     keys,
     94     cache=cache,
     95     get_id=_thread_get_id,
     96     pack_exception=pack_exception,
...
    603     arrchunks = [arr[o : offsets[i + 1]] for arr in arrs]
    604     res.append(func(*arrchunks))
--> 605 return np.array(res)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (63529,) + inhomogeneous part.

Getting this error now.

cbyrohl commented 1 year ago

Getting this error now.

Could you clarify what you have changed? The following code snippet (changing units=True => units=False) works for me:

import numpy as np
from scipy.stats import binned_statistic
from scida import load
import dask.array as da

basePath = "/virgotng/universe/IllustrisTNG/TNG100-3/output"

ds = load(basePath+"/snapdir_099", units=False)

gas = ds.data["PartType0"]
vol = gas["Masses"] / gas["Density"]
grp = ds.data["Group"]
pos3 = gas["Coordinates"] - grp["GroupPos"][gas["GroupID"]]
dist = da.sqrt(da.sum((pos3)**2, axis=1)) 

def customfunc(dist, density, volume):
    a = binned_statistic(dist, density, statistic="sum", bins=np.linspace(0, 200, 10))[0]
    b = binned_statistic(dist, volume, statistic="sum", bins=np.linspace(0, 200, 10))[0]
    return a/b

g = ds.grouped(dict(dist=dist, Density=gas["Density"],
                    Volume=vol))
s = g.apply(customfunc).evaluate()

Nevermind. Your error should indeed be raised for newer numpy versions. Let me fix that.

cbyrohl commented 1 year ago

The initial example (reducing the computed halo count with nmax=20):

import numpy as np
from scipy.stats import binned_statistic
from scida import load
import dask.array as da

basePath = "/virgotng/universe/IllustrisTNG/TNG100-3/output"

ds = load(basePath+"/snapdir_099", units=True)

gas = ds.data["PartType0"]
vol = gas["Masses"] / gas["Density"]
grp = ds.data["Group"]
pos3 = gas["Coordinates"] - grp["GroupPos"][gas["GroupID"]]
dist = da.sqrt(da.sum((pos3)**2, axis=1)) 

def customfunc(dist, density, volume):
    a = binned_statistic(dist, density, statistic="sum", bins=np.linspace(0, 200, 10))[0]
    b = binned_statistic(dist, volume, statistic="sum", bins=np.linspace(0, 200, 10))[0]
    return a/b

g = ds.grouped(dict(dist=dist, Density=gas["Density"],
                    Volume=vol))
s = g.apply(customfunc).evaluate(nmax=20)

now works (with and without units).

However, we get the following warnings with units:

  warnings.warn(
/u/byrohlc/.cache/pypoetry/virtualenvs/scida-J9ngM3zU-py3.9/lib/python3.9/site-packages/numpy/core/shape_base.py:121: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  ary = asanyarray(ary)
/u/byrohlc/.cache/pypoetry/virtualenvs/scida-J9ngM3zU-py3.9/lib/python3.9/site-packages/scipy/stats/_binned_statistic.py:550: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  values = np.asarray(values)
/u/byrohlc/.cache/pypoetry/virtualenvs/scida-J9ngM3zU-py3.9/lib/python3.9/site-packages/numpy/core/shape_base.py:121: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  ary = asanyarray(ary)
cbyrohl commented 1 year ago

Above warnings are caused by the example function, not scida.