eddelbuettel / rcppcnpy

Rcpp bindings for NumPy files
GNU General Public License v2.0
26 stars 16 forks source link

Please add support for .npz files created by numpy.savez_compressed #1

Closed mike-lawrence closed 9 years ago

mike-lawrence commented 9 years ago

At present, attempting to read these files with npyLoad yields the error "header ended improperly".

eddelbuettel commented 9 years ago

Can you cook up a short example in Python so that I can replicate at my end?

But come to think about it, I think I only ever supported "external" gzip compression. What you desire may be harder if we're mixing Python and R compression. Do you by chance need it so badly that I could convince you to work on a pull request?

mike-lawrence commented 9 years ago

Quick example, starting with python code to generate the npz:

import numpy as np
a = np.array([1,2])
np.savez_compressed( 'a.npz' , a )

Now in R:

library(RcppCNPy)
a = npyLoad('a.npz')

yields the error Error in npyLoad("a.npz") : header ended improperly. Possibly of note is that I'm on a mac. Running pytho v2.7.8, numpy v1.9.0, R v3.1.1 and building RcppCNPy from the latest github source.

eddelbuettel commented 9 years ago

That is simply outside the scope of what I have carried over from the CNPy library. We would have to write (and test !!) new functions. I am not sure I have time (or a reason) for this.

I leave the ticket open, and am amenable to making this clearer in the documentation.

mike-lawrence commented 9 years ago

After some exploration I realized that, at least for my purposes, I don't need npz support afterall. If all you're dealing with are single arrays, then saving to npy via numpy.savez() and running gzip on the result yields files that are nearly exactly the same size as the npz resulting from numpy.savez_compress(). Sure, numpy.savez_compress() might be a faster route since you're not writing to disk twice, but for my file sizes (a few hundred MB), it hardly matters. Sorry to bother!

eddelbuettel commented 9 years ago

No worries. And that is pretty much exactly what the Python legacy code I dealt with at work did: write number, then compress via gzip. Which is why that was what I implemented...

(Do you by chance have a Solaris box nearby? I have an open issue but no access :-/)