desihub / desitarget

DESI Targeting
BSD 3-Clause "New" or "Revised" License
18 stars 23 forks source link

ragged array deprecation warning with numpy 1.20.3 #779

Closed sbailey closed 2 years ago

sbailey commented 2 years ago

Fuji housecleaning after updating to latest versions of external dependencies:

desitarget.geomask.circle_boundaries generates a deprecation warning with numpy 1.20.3 when nloc is different per target:

/global/common/software/desi/users/sjbailey/desitarget/py/desitarget/geomask.py:764: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  offdec = np.array([rad*np.sin(np.arange(ns)*2*np.pi/ns)

e.g. the unit tests try with r=[ 1.001, 20.02 , 10.01 ], nloc=[ 2, 21, 11]. The list comprehension in

    offdec = np.array([rad*np.sin(np.arange(ns)*2*np.pi/ns)
                       for ns, rad in zip(nloc, radius)]).transpose()

has 3 lists of length 2,21, and 11, but then numpy is unhappy about stuffing those into a single 2D array. I think the downstream code is doing the right thing by guessing dtype=object which ends up returning ra, dec arrays of length 34=2+21+11.

@geordie666 please check and confirm that dtype=object is the intended behavior for this case where each ra,dec gets a different number of circles around them.

geordie666 commented 2 years ago

I'm fixing this now. dtype=object is almost correct. But, not quite, for interesting reasons. If an array is ragged, dtype=object makes the (well, my) expected choice for a ragged-array construction:

np.array([np.arange(i) for i in [1, 2, 3]], dtype=object)
Out[]: array([array([0]), array([0, 1]), array([0, 1, 2])], dtype=object)
np.shape(_)
Out[]: (3,)

This result would be the same for i in [2, 4, 7] or for i in [2, 21, 11], etc.

But, for arrays that aren't ragged, dtype=object makes a different choice:

np.array([np.arange(i) for i in [2, 2, 2]], dtype=object)                                                                                                   
Out[]: 
array([[0, 1],
       [0, 1],
       [0, 1]], dtype=object)
np.shape(_)
Out[]: (3, 2)

Fortunately, the unit tests caught this discrepancy and failed on the second (non-ragged) case.

This is really just the dangers of using ragged arrays rather than anything specifically to do with dtype=object but I thought I'd record this corner case here in case we find other instances of ragged arrays in my (sloppy!) coding.

geordie666 commented 2 years ago

I take it back. My code had a bunch of transposes in it to catch the case that I described above. The unit test is actually failing on:

np.radians(np.array([1, 2, 3]))
Out[]: array([0.01745329, 0.03490659, 0.05235988])

Versus:

np.radians(np.array([1, 2, 3], dtype=object))
TypeError: loop of ufunc does not support argument 0 of type int which has no callable radians method

So, that's interesting. Apparently dtype=object has some hidden dangers.

geordie666 commented 2 years ago

Addressed in #781.