npyLoad yields wrong values

muellerflorian01 commented 5 years ago

Dear Dirk Eddelbuetterl,

during using the npyLoad function in R I encountered the following issue. When trying to load a npy file (saved from python) the matrix values are completely wrong.

To illustrate, in Python I have the following numpy array:

Shape: (10, 4) Dtype: float32 [[ 1. 24. 59. 465.] [ 1. 26. 166. 466.] [ 1. 25. 35. 458.] [ 1. 28. 36. 465.]]

I then export via "np.save('output.npy', )".

Upon loading the file in R via "npyLoad(filename = 'output.npy')" I however get the following:

          [,1]          [,2]           [,3]          [,4]

[1,] 5.368710e+08 1.412329e+19 1.073742e+09 1.441152e+19 [2,] 8.053065e+08 1.210568e+19 2.147484e+09 1.412329e+19 [3,] 5.928788e-323 5.928788e-323 4.940656e-324 8.889260e+247 [4,] 8.573468e-315 1.946940e-308 -2.681562e+154 1.375846e-315

In contrast, exporting the same array via "np.savetext" and importing in R via "read.csv" yields the expected (i.e., correct) data.

It would be great if you could help me out with this issue.

Best,

Florian Müller.

eddelbuettel commented 5 years ago

About to head to work so can't look at it now -- see maybe the unit tests. We do check int and double; maybe the problem is that it is float which R does not have so try casting up to double.

More later.

muellerflorian01 commented 5 years ago

Great! Thanks for the hint, got it solved already. Using:

np.save('output.npy', .astype(np.float64))

to export from python did the trick.

Best,

Florian Müller.

Am 15.11.2018 um 14:42 schrieb Dirk Eddelbuettel notifications@github.com:

About to head to work so can't look at it now -- see maybe the unit tests. We do check int and double; maybe the problem is that it is float which R does not have so try casting up to double.

More later.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

eddelbuettel commented 5 years ago

I poked around on the train in, and we're not as clear about "only use 64 bit double as R does" as we could be. If you can think of a place where the documentation should say that ...

Otherwise, a sneaky (but more involved) workaround may be using reticulate as I do in the other vignette. As I recall, it has more converters from NumPy baked in. Whereas we rely on cnpy and the plain 64-bit double conversion.

eddelbuettel commented 5 years ago

If you're satsfied with that approach feel free to close this.

muellerflorian01 commented 5 years ago

Thanks for the help, it might be an option to include a comment on this in the "Details" section of the R-Documentation, so that other know how to avoid the problem.

eddelbuettel commented 5 years ago

Agreed. I just added a paragraph in 8029bbd. In the 'raw' source is says

  Note that R uses only one \code{integer} type (which uses 32 bits) and
  one \code{double} floating point type (which uses 64 bits). If Python
  data of either type with a different bitsize is to be shared with R,
  it has be cast to the corresponding width used by R first.

muellerflorian01 commented 5 years ago

Perfect, that should make it clear to users.

Thanks, Florian.

Am 16.11.2018 um 13:36 schrieb Dirk Eddelbuettel notifications@github.com:

Agreed. I just added a paragraph in 8029bbd. In the 'raw' source is says

Note that R uses only one \code{integer} type (which uses 32 bits) and one \code{double} floating point type (which uses 64 bits). If Python data of either type with a different bitsize is to be shared with R, it has be cast to the corresponding width used by R first.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

eddelbuettel / rcppcnpy

npyLoad yields wrong values #21