HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
128 stars 52 forks source link

Value with integers/indices in select param does not reduce dimensions #88

Closed loichuder closed 3 years ago

loichuder commented 3 years ago

When fetching a slice with select with integers/indices, I would expect the number of dimensions to be reduced.

Ex: I want to fetch the first row 0,: from a 2D array of dimensions [2, 3]. Currently, the fetch gives me a 2D array of dimensions [1, 3] while I would expect a 1D array of dimensions [3].

Looking at my fix at https://github.com/HDFGroup/hsds/pull/85, I understand the behaviour: integers are converted in slices of 1 elements.

Do you think it would be possible to not make this conversion and treat indices apart ?

jreadey commented 3 years ago

Sure, it's actually pretty easy to do from the server side. Take a look at this commit: https://github.com/HDFGroup/hsds/commit/71f342fd463aef96b2628fa90e6b168fea3ecd06.

I didn't want to break any existing clients, so to get the reduce dim effect, you need to add a parameter: "reduce_dim" in the request.

jreadey commented 3 years ago

I've merged this change into master.

jreadey commented 3 years ago

closing issue - please reopen if you run into any problems.

loichuder commented 3 years ago

I tried and it works but the behaviour is different from what I would expect (which would be NumPy like).

For example, let's consider the dataset of dimensions (2, 3) and say I want the slice :, 0. The dimensions of the resulting slice will be:

All is well here.

Now imagine the dimensions of the original dataset are (1, 3) and I want the slice :,0. In this case, I will get:

In this case, the result with reduce_dim is not consistent with NumPy... Same problem if I want to get the original dataset:

As for #87, we worked around this in h5web. We do not make use of reduce_dim and flat the slice according to the original dataset shape when getting the response.