cas3ymau3 / netcdf4-python

Automatically exported from code.google.com/p/netcdf4-python
Other
0 stars 0 forks source link

netCDF4.Variable should define a __array__ method #216

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
This would allow numpy to quickly load data from a netCDF4.Variable object 
when/if it is passed directly to numpy ufunc (or anything that calls np.array 
on its input).

Right now, it works to pass a Variable object to a ufunc, but accessing the 
data is very slow. Presumably this is because numpy falls back on some sort of 
primitive routine that treats the Variable as a general iterable object. For 
example:

>>>  ncvar
<netCDF4.Variable at 0x112e76ef0>
>>>  ncvar.shape
(1051,)
>>> %timeit np.sin(ncvar)
1 loops, best of 3: 491 ms per loop
>>> %timeit np.sin(ncvar[:])
10000 loops, best of 3: 179 µs per loop

To fix this, you need to define a __array__ method like so numpy knows how to 
quickly convert variables into numpy arrays. This is as simple as:

    def __array__(self):
        return self[...]

Hopefully you think this is worth adding in the next release!

Cheers,
Stephan

Original issue reported on code.google.com by sho...@climate.com on 19 Feb 2014 at 8:04

GoogleCodeExporter commented 8 years ago
That's a great idea!  Could you provide a short script that I could use to test 
the speed difference before and after adding the __array__ method?

Original comment by whitaker.jeffrey@gmail.com on 19 Feb 2014 at 12:30

GoogleCodeExporter commented 8 years ago
Never mind -  I did this myself and verified that it is indeed a lot faster.  
__array__ method added in svn trunk.  Note: caution is required, because this 
could chew up a lot of memory for large netcdf variables, since __array__ reads 
the entire variable into a numpy array.

Original comment by whitaker.jeffrey@gmail.com on 19 Feb 2014 at 1:03

GoogleCodeExporter commented 8 years ago

Original comment by whitaker.jeffrey@gmail.com on 26 Feb 2014 at 2:04