eddelbuettel / rcppcnpy

Rcpp bindings for NumPy files
GNU General Public License v2.0
26 stars 16 forks source link

Proof of concept: Add support for 1 byte arrays #13

Closed jmarshallnz closed 6 years ago

jmarshallnz commented 7 years ago

Currently rcppcnpy assumes arrays are 8bytes (double or int64_t). However, cnpy knows this, so we should use it.

This is just a proof of concept as to a potential solution for the matrix case. If you'd like me to continue working on it, I'm happy to generalise.

Example npy's would be the ones from google sketch:

https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/numpy_bitmap?pli=1

They're all 1 byte arrays (size n*784). My guess is they're supposed to be unsigned - ATM I'm assuming signed which is probably wrong. Again, I'm happy to look into it if you feel this is something worth having: the information on data type is in the npy file.

Notice that I'm ignoring the type parameter. I'd propose that this is always ignored as we know the type from the npy file, and should just generate the appropriate R type to suit.

(We could also probably get rid of the dotranspose parameter, though I can see the argument for leaving it in for efficiency)

eddelbuettel commented 7 years ago

Could you augment it with a bit of Python to generate some data and write it to file, and maybe also read it back from file so that we can ensure R didn't "change" it? Ideally we want a flow where Python writes, R reads and writes and Python reads it back -- without accidental losses.

jmarshallnz commented 7 years ago

Good idea, I'll look at modifying createFiles.py to suit.

On Mon, May 22, 2017 at 10:52 AM, Dirk Eddelbuettel < notifications@github.com> wrote:

Could you augment with a bit of Python to generate some data and write it to file, and maybe also read it back from file so that we can ensure R didn't "change" it? Ideally we want a flow where Python writes, R reads and writes and Python reads it back -- without accidental losses.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eddelbuettel/rcppcnpy/pull/13#issuecomment-302968885, or mute the thread https://github.com/notifications/unsubscribe-auth/AAY0KxkpvNYJ7aqSyQZ-20hmUIM0UyOtks5r8MAigaJpZM4NhvUV .

eddelbuettel commented 6 years ago

Closing for lack of follow-up.