Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
757 stars 264 forks source link

Reshaping variable for indexing #473

Open fsteinmetz opened 9 years ago

fsteinmetz commented 9 years ago

Due to the different behaviours of integer array indexing between netcdf4-python and numpy, I am missing the ability of indexing dependent axes, in which for example, data[arange(3), arange(3)] returns an array of shape (3,) This behaviour could be circumvented by using numpy's ravel_multi_index and reshaping a variable to a single dimension. So, is it possible to reshape a variable without copying it ?

jswhit commented 9 years ago

There is no way to change the dimensions of a netCDF Variable without copying it.

Don't know if this helps, but your example

data[arange(3),arange(3)]

translates to

diag(data[arange(3),arange(3)])

if data is a netCDF variable instead of a numpy array.

shoyer commented 9 years ago

The netCDF4 data model (HDF5) can't easily support some of these fancy indexing tricks. But if you're using netCDF3 files, try scipy.io.netcdf, which supports the full generality of NumPy's indexing capabilities.

fsteinmetz commented 9 years ago

Thanks for the explanation! It's too bad that netCDF4 and HDF5 don't support that, I'll have a look at netCDF3. In my example I was taking diagonal elements only for the example, the purpose was to index multiple coordinates independently.

shoyer commented 9 years ago

To clarify: we certainly could imagine adding a way to do this sort of indexing, but it would be no better than a loop in pure Python. It's just not something we could do efficiently. This is actually pretty similar to netcdf4-python's support for array based "orthogonal indexing", which uses a similar loop.

jswhit commented 9 years ago

scipy.io.netcdf works by memory-mapping the data in the netcdf-3 file directly to a numpy array using numpy.memmap. It can then use the numpy indexing code directly. This works with the very simple netcdf-3 format, but not with the more complicated HDF5 format.