catalystneuro / lazy_ops

Lazy transposing and slicing of h5py and Zarr Datasets
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

DatasetView object shape is incorrect #19

Closed Saksham20 closed 3 years ago

Saksham20 commented 4 years ago

h5py dataset: h5py_dataset.shape # output = 2,3,3 After lazy_transpose I get: DatasetView(d1).lazy_transpose([2,0,1]).shape # output = 2,3,3 / should be 3,2,3 Doing this gives the correct dim: DatasetView(d1).lazy_transpose([2,0,1])[:].shape # output = 3,2,3

Also, if slicing the whole dataset and then calling shape, will it result in loading the whole dataset in memory?

bendichter commented 4 years ago

Yes, DatasetView(d1).lazy_transpose([2,0,1])[:].shape would read the whole dataset into memory

d-sot commented 4 years ago

Can you share more about your setup? I'm getting the correct shape for d1 = f.create_dataset(data=np.random.rand(2,3,3),name="dataset") print(DatasetView(d1).lazy_transpose([2,0,1]).shape) # output is 3,2,3

If lazy_slicing the whole dataset, and then calling shape, it should not load the dataset. This is the behavior of h5py.Dataset too. Except those attributes that are for reading the data, mainly [] without lazy_slice, and dsetread(), DatasetView should generally access attributes without loading the data, similar to h5py.Dataset.

bendichter commented 4 years ago

I think by "slicing the whole dataset" @Saksham20 means using [:] before the .shape. Is that right, Saksham?

bendichter commented 4 years ago

@d-sot is that true of the latest release, or the master branch?

d-sot commented 4 years ago

It is true of the Master branch. The release package is now updated.

d-sot commented 4 years ago

@Saksham20 upgrading to the new release should fix this issue.

Saksham20 commented 4 years ago

I think by "slicing the whole dataset" @Saksham20 means using [:] before the .shape. Is that right, Saksham?

Yes

@Saksham20 upgrading to the new release should fix this issue.

I'm actually getting an error loading numcodecs which is a requirement of zarr which is a req of lazy_ops. I tried pip installing it separately but get the same error:

Error is quite long to put here but pip is unable to build the package from numcodec.tar.gz file, builds wheel file fails again, then tries to build the package from setup file but fails yet again.

The workaround is that I downloaded a .whl file from here and did pip install. Having done this, it works as you say. (I have python 3.7)

@bendichter this is a similar workaround for sima installation error that I had had. Maybe its a Python version issue again.

bendichter commented 4 years ago

@Saksham20 hmm looks like numcodecs is compatible with the latest versions of python in their tests, but I don't know if they have released this. @d-sot, can you restructure this so that zarr is not required? Similar to the way SpikeInterface does this?

d-sot commented 3 years ago

zarr dependency is not currently required. PR #20.