bowman-lab / enspara

Modeling molecular ensembles with scalable data structures and parallel computing
https://enspara.readthedocs.io
GNU General Public License v3.0
33 stars 16 forks source link

Issue with ra.py module #216

Closed BJWiley233 closed 2 years ago

BJWiley233 commented 2 years ago

Hi,

I am getting issue with the RaggedArray module. I have attached the hdf5 file from one of my FAST swarms. Issue is described by numpy error below. The deprecation warning is was alway there but I think this is first time reading file in which ra.load actually returned a class of RaggedArray().

from enspara.util import array as ra

assigns = ra.load('assignments.h5')
unique_states = np.unique(assigns)

/opt/docs/enspara/lib/python3.7/site-packages/numpy/lib/arraysetops.py:270: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ar = np.asanyarray(ar)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 6, in unique
  File "/opt/docs/enspara/lib/python3.7/site-packages/numpy/lib/arraysetops.py", line 272, in unique
    ret = _unique1d(ar, return_index, return_inverse, return_counts)
  File "/opt/docs/enspara/lib/python3.7/site-packages/numpy/lib/arraysetops.py", line 333, in _unique1d
    ar.sort()
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Link to file: https://drive.google.com/file/d/1u6Jc1Dq8sIWF0PRV4pyyzLSVUNGv_k0W/view?usp=sharing

lgsmith commented 2 years ago

Just looking at this, I believe if you want unique states you should flatten the ragged array first. So you’d say “unique_states = np.unique(assigns.flatten())”.

The reason this is happening now is because you had previously not had ragged objects, and numpy knows how to flatten ndarrays so that it can apply unique to them. For many operations that are not dimension preserving (like unique) the way to work with a ragged array is through flatten.

On Fri, Sep 23, 2022 at 1:01 AM BJWiley23 @.***> wrote:

Hi,

I am getting issue with the RaggedArray module. I have attached the hdf5 file from one of my FAST swarms. Issue is described by numpy error below. The deprecation warning is was alway there but I think this is first time reading file in which ra.load actually returned a class of RaggedArray().

from enspara.util import array as ra assigns = ra.load('assignments.h5')unique_states = np.unique(assigns) /opt/docs/enspara/lib/python3.7/site-packages/numpy/lib/arraysetops.py:270: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. ar = np.asanyarray(ar)Traceback (most recent call last): File "", line 1, in File "<__array_function__ internals>", line 6, in unique File "/opt/docs/enspara/lib/python3.7/site-packages/numpy/lib/arraysetops.py", line 272, in unique ret = _unique1d(ar, return_index, return_inverse, return_counts) File "/opt/docs/enspara/lib/python3.7/site-packages/numpy/lib/arraysetops.py", line 333, in _unique1d ar.sort()ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Link to file: https://drive.google.com/file/d/1u6Jc1Dq8sIWF0PRV4pyyzLSVUNGv_k0W/view?usp=sharing

— Reply to this email directly, view it on GitHub https://github.com/bowman-lab/enspara/issues/216, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYLHXG5HXXKWDZG4723V7U2TDANCNFSM6AAAAAAQTUULSU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

BJWiley233 commented 2 years ago

Ok just to follow up, the point of this in FAST and in Enspara is to get unique states as a single 1-dimensional array and not a unique set of 1-dimensional arrays? Let me know if this questions makes sense. Thanks.

justinrporter commented 2 years ago

Yes!

It would be cool if np.unique(ra) produced a ragged array of the results as though you had done ra.RaggedArray([np.unique(row) for row in ra]) but unfortunately we don't have any control over what happens in np.unique...

BJWiley233 commented 2 years ago

Wow can't believe I didn't think of that. Thanks again.