eddelbuettel / rcppcnpy

Rcpp bindings for NumPy files
GNU General Public License v2.0
26 stars 16 forks source link

Data is improperly read from binary int32 array. #26

Closed bbogart closed 4 years ago

bbogart commented 4 years ago

Here is my numpy array, here in npy

array([[0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1]], dtype=int32)

Written to disk with:

numpy.save("array_int32.npy", arrayInt)

Loading in R gives me very different values:

> library(RcppCNPy)
> numpyArray <- npyLoad("array_int32.npy")
> numpyArray
               [,1]
 [1,]  0.000000e+00
 [2,] 2.121996e-314
 [3,] 2.121996e-314
 [4,] 2.121996e-314
 [5,] 2.121996e-314
 [6,] 2.929809e-321
 [7,] 1.865035e-314
 [8,] 4.688588e-310
 [9,] 4.688589e-310
[10,] 4.688589e-310
> numpyArrayInt <- npyLoad("array_int32.npy","integer")
> numpyArrayInt
            [,1]
 [1,]          0
 [2,]          0
 [3,]          1
 [4,]          0
 [5,]          0
 [6,]        593
 [7,] -520093683
 [8,]  781958720
 [9,]  807221072
[10,]  807238832

I'm using the CRAN package of RcppCNPy installed today with R 3.6.3 and numpy 1.13.3 on Ubuntu 18.04.4 LTS.

eddelbuettel commented 4 years ago

That seems ... weird and unfortunate. Look e.g. at the file tests/createFiles.py which has been there since 2012 (!!). Reading (basic) 32-bit integer and 64 bit double is practically all we have done since day one, so I am not sure what is different with your file. But I am also less experienced in NumPy so I can't tell if you write the file differently, or if NumPy changed something or ...

Can you do me the favour and check if the basic tests in the package work on your system? (They should, I am on Ubuntu too and CRAN tests these things too...)

eddelbuettel commented 4 years ago

Found it, possibly, at the bottom of #25. We apparently only take 64bit integers from Python (likely due to a restriction in the CNPy library).

Can you try with 64 bit integers?

eddelbuettel commented 4 years ago

Here is an alternative for you: using package reticulate as described in the second vignette.

Very briefly:

R> library(reticulate)
R> np <- import("numpy")
R> np$load("array_int32.npy")
      [,1]
 [1,]    0
 [2,]    0
 [3,]    0
 [4,]    1
 [5,]    1
 [6,]    1
 [7,]    0
 [8,]    1
 [9,]    0
[10,]    1
R> 

where array_int32.npy is your file.

CNpy always had a limitation for 64 bit integers; reticulate is doing more mapping from NumPy and is probably a good bet for you.