eddelbuettel / rcppcnpy

Rcpp bindings for NumPy files
GNU General Public License v2.0
26 stars 16 forks source link

dimension mixup #12

Closed jokedurnez closed 6 years ago

jokedurnez commented 7 years ago

Encountered a problem with this file: global_signals.npy.zip

It was written in numpy as a [415x518] array.

python

gs = np.load("global_signals.npy")
print(gs.shape)
 >> (415,518)

However, when I read it in R, it is a [518x415] matrix and most importantly, the values are filled in the wrong order of dimensions. (you can clearly see it, because most of the values at at index 415 on the second dimension should be ##0).

I had to apply the following patch to obtain the same matrix:

R

library(RcppCNPy)
gs <- npyLoad('data/global_signals.npy')
gsv <- as.vector(t(gs))
gs <- array(gsv,dim=dim(gs))
eddelbuettel commented 7 years ago

Hm, that would not be good, and I also think we at least to check against this with unit.

But when I load your file, I get a 415x518 as in Python:

> library(RcppCNPy)
> gs <- npyLoad('global_signals.npy')
> dim(gs)
[1] 415 518
> 

Could you try some of the files from the tests/ directory to see if those pan out for you?

eddelbuettel commented 7 years ago

But the values are indeed mixed up:

>>> gs[0:3,0:3]  
array([[  629.58422852,   624.5168457 ,   607.39233398],
       [ 2630.12426758,  2624.10498047,  2610.34570312],
       [ 2624.42895508,  2624.76464844,  2610.9453125 ]])
>>> 

versus

> gs[1:3,1:3]
          [,1]     [,2]     [,3]
[1,]  629.5842 2630.124 2624.429
[2,] 2617.0134 2619.037 2616.405
[3,] 2618.6907 2618.334 2620.924
> 

Are there any other "tells" in that data structure?

The basic things we do still work:

edd@brad:~/git/rcppcnpy/tests(master)$ r loadFiles.R
     [,1] [,2] [,3] [,4]
[1,]  0.0  1.1  2.2  3.3
[2,]  4.4  5.5  6.6  7.7
[3,]  8.8  9.9 11.0 12.1
     [,1] [,2] [,3] [,4]
[1,]  0.0  1.1  2.2  3.3
[2,]  4.4  5.5  6.6  7.7
[3,]  8.8  9.9 11.0 12.1
     [,1] [,2] [,3] [,4]
[1,]    0    1    2    3
[2,]    4    5    6    7
[3,]    8    9   10   11
[1] 0.0 1.1 2.2 3.3 4.4
[1] 0 1 2 3 4
edd@brad:~/git/rcppcnpy/tests(master)$ python loadFiles.py 
[[  0.    1.1   2.2   3.3]
 [  4.4   5.5   6.6   7.7]
 [  8.8   9.9  11.   12.1]]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[ 0.   1.1  2.2  3.3  4.4]
[0 1 2 3 4]
edd@brad:~/git/rcppcnpy/tests(master)$ 
eddelbuettel commented 7 years ago

There is a tell: it has fortran_order set to True, our example files do not.

eddelbuettel commented 7 years ago

All is well if you tell R and RcppCNPy not to transpose:

> library(RcppCNPy)
> gs <- npyLoad('global_signals.npy', dotranspose=FALSE)
> dim(gs)
[1] 415 518
> gs[1:3,1:3]
          [,1]      [,2]      [,3]
[1,]  629.5842  624.5168  607.3923
[2,] 2630.1243 2624.1050 2610.3457
[3,] 2624.4290 2624.7646 2610.9453
> 
jokedurnez commented 7 years ago

Oh ok I see, thanks !

eddelbuettel commented 7 years ago

I have to if I should maybe warn if the toggle is set in the header. Your report is definitely very valid; I am just not sure how often NumPy files are written with fortran_order=True. I am not much of a Python or NumPy user so I would not know...

jokedurnez commented 7 years ago

That was just my basic python / numpy installation. I'm pretty sure I never touched the fortran_order-option :-)

mcallaghan commented 7 years ago

Same thing happened to me, I didn't change any Numpy or Python defaults, was confused for a while... Now it works!

eddelbuettel commented 6 years ago

Thanks to reticulate, we now also have an alternative. I just wrote a simple one-page vignette.

For this example, it just works (using the store copy of the file I had):

R> library(reticulate)
R> np <- import("numpy")
R> gs <- np$load("global_signals.npy")
R> dim(gs)
[1] 415 518
R> gs[1:3,1:3]
         [,1]     [,2]     [,3]
[1,]  629.584  624.517  607.392
[2,] 2630.124 2624.105 2610.346
[3,] 2624.429 2624.765 2610.945
R>

Of course, this does not take away from the fact that RcppCNPy seems to require the flipping of the transpose toggle as discussed above. We simply have an alternate route now.

eddelbuettel commented 6 years ago

We can close this too I presume as reticulate offers a second way.

JasonAHendry commented 4 years ago

Just wanted to note I encountered a similar problem.

Had a numpy array of (21475, 2640) I generated in python. Using npyLoad() I recover an array of dimensions (21475, 2640) in R. However, when I inspect the data, the rows and columns have been inverted; the first row in python becomes the first column in R.

dotranspose=F fixed the issue, but the behaviour caught me out.