ArtifactDB / dolomite-base

Save Bioconductor objects in Python.
https://artifactdb.github.io/dolomite-base/
MIT License
2 stars 0 forks source link

might be an issue with numpy #16

Closed jkanche closed 10 months ago

jkanche commented 10 months ago

Anywhere this notation is used: - https://github.com/ArtifactDB/dolomite-base/blob/master/src/dolomite_base/_utils_vector.py#L44

⋊> ~/D/p/a/dolomite-ranges on migration ⨯ python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = [1, None, 100]
>>> mask = [0, 1, 0]
>>> placeholder = 100
>>> import numpy as np
>>> arr = np.array(a)
>>> arr
array([1, None, 100], dtype=object)

###### replace the placeholder value
>>> arr[mask] = placeholder
>>> arr
array([100, 100, 100], dtype=object)

### convert mask into numpy array
>>> arr[np.array(mask)] = placeholder
>>> arr
array([100, 100, 100], dtype=object)
LTLA commented 10 months ago

_list_to_numpy_with_mask should have sanitized x to a properly typed Numpy array, there shouldn't be a dtype=object anymore.

jkanche commented 10 months ago

that's true, i'm more concerned about how the assignment seems to work:

>>> import numpy as np
>>> a = np.array([1,0,100], dtype=np.int32)
>>> m = np.array([0,1,0], np.int8)
>>> a
array([  1,   0, 100], dtype=int32)
>>> m
array([0, 1, 0], dtype=int8)
>>> a[m] = 2000
>>> a
array([2000, 2000,  100], dtype=int32)

its considering the mask as indices 0 and 1 and replacing the value in those two positions.

I guess we should convert it into a boolean array

>>> a = np.array([1,0,100], dtype=np.int32)
>>> a[m.astype(np.bool_)] = 4000
>>> a
array([   1, 4000,  100], dtype=int32)
LTLA commented 10 months ago

Yes, it's a little confusing. I'll rework this entire set of utilities to make them easier to use.