DC-analysis / dclab

Python library for the post-measurement analysis of real-time deformability cytometry (RT-DC) data sets
https://dclab.readthedocs.io
Other
10 stars 12 forks source link

Feature "image" is accessed differently if dataset is DCOR resource #181

Closed B-Hartmann closed 2 years ago

B-Hartmann commented 2 years ago

System

Windows 10 Python 3.9 dclab 0.43.1

Minimal working example:

>>import dclab
>>ds_online = dclab.new_dataset("e4d59480-fa5b-c34e-0001-46a944afc8ea")
>>ds_offline = dclab.new_dataset("path/to/file.rtdc")

>>ds_online["image"].shape
(4544, 80, 250)

>>ds_offline["image"].shape
(1561, 80, 250)

>>ds_offline["image"][0:2,:,:]
array([[...]])

>>ds_online["image"][0:2,:,:]
Traceback (most recent call last):
  File "C:\Users\bhartma\.environments\env\lib\site-packages\IPython\core\interactiveshell.py", line 3457, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-e2fa7e36cfcd>", line 1, in <module>
    ds_online["image"][0:2,:,:]
  File "C:\Users\bhartma\.environments\env\lib\site-packages\dclab\rtdc_dataset\fmt_dcor\features.py", line 26, in __getitem__
    indices = np.arange(len(self))[event]
IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

What I expected

So the shape of the data is the same, basically a 3 dimensional array, right? But then using slicing to access the first few images fails when accessing the dataset on DCOR.
I would have expected that this works, no matter if the dataset resource if online or offline.

paulmueller commented 2 years ago

Slicing non-scalar features from DCOR is not supported right now, simply because it is so inefficient (image data is read and converted to json by the server, then converted back to an array by dclab). If you wanted to work with non-scalar features, you should download the entire dataset.

I don't see a particular use case where slicing image data justifies not downloading the entire dataset. But I am open to discussing it.

B-Hartmann commented 2 years ago

Okay. For me, it works now just using [0:10] instead of [0:10, :, :], so that is fine.

The performance issue is reasonable! I don't have a better solution for this right now, I guess I will close the issue for now.

paulmueller commented 2 years ago

Ok thanks!