fhs / NPZ.jl

A Julia package that provides support for reading and writing Numpy .npy and .npz files
Other
117 stars 16 forks source link

Enhancement: Load as column-major array #34

Closed jakobnissen closed 4 years ago

jakobnissen commented 4 years ago

Often, I write a NxF numpy array with N observations and F features in row-major order in order to have the observations in contiguous memory. When loading to Julia, I prefer to load it to a FxN Matrix, such that observations are still contiguous. Currently, this is achievable only by loading in the data (which internally transposes it), then re-transposing it back. This is inefficient. I propose having some kind of keyword "keep_contiguity" or something less terribly-named, which loads the array in as if it was "fortran-contiguous", even if it actually isn't. I.e, given an NxF numpy array in C-contiguous order, return a FxN Matrix. I can pitch a PR if you like.

fhs commented 4 years ago

There is a fortran_order flag built into the numpy data format for column major order, and NPZ.jl respects that. When you're saving the data, you just need to make sure the array is in fortran order (if that's what you want) and NPZ.jl will load it in column major order. Here is how you save a fortran order array:

In [1]: x = np.asfortranarray(np.random.randint(10, size=(3,4)))                

In [2]: x.flags['F_CONTIGUOUS'], x.flags['C_CONTIGUOUS']                        
Out[2]: (True, False)

In [3]: np.save("/tmp/x.npy", x)                                                

In [4]: !head -1 /tmp/x.npy                                                     
�NUMPYv{'descr': '<i8', 'fortran_order': True, 'shape': (3, 4), }               

Note that numpy.load has no special keyword argument to override the fortran_order flag in the data, so I'm not inclined to add it in NPZ.jl.