fhs / NPZ.jl

A Julia package that provides support for reading and writing Numpy .npy and .npz files
Other
117 stars 16 forks source link

Missing support for numpy.record dtype #17

Open JonathanAnderson opened 6 years ago

JonathanAnderson commented 6 years ago

I have a npy file that I cannot read. I keep getting an error saying parsing header failed: expected character ''', found '['.

The file can be created as follows...

df = pd.DataFrame({"i": [1, 2, 3],"j": [4, 5, 6],"k":[3.14, 2.72, 1.62]})
arr = np.ascontiguousarray(df.to_records(index=False)).view()
np.save("/tmp/temp.npy", arr)

or alternatively just writing the bytes of the file from julia

bytes = UInt8[0x93, 0x4e, 0x55, 0x4d, 0x50, 0x59, 0x01, 0x00, 0x66, 0x00, 0x7b, 0x27, 0x64, 0x65, 0x73, 0x63, 0x72, 0x27, 0x3a, 0x20, 0x5b, 0x28, 0x27, 0x69, 0x27, 0x2c, 0x20, 0x27, 0x3c, 0x69, 0x38, 0x27, 0x29, 0x2c, 0x20, 0x28, 0x27, 0x6a, 0x27, 0x2c, 0x20, 0x27, 0x3c, 0x69, 0x38, 0x27, 0x29, 0x2c, 0x20, 0x28, 0x27, 0x6b, 0x27, 0x2c, 0x20, 0x27, 0x3c, 0x66, 0x38, 0x27, 0x29, 0x5d, 0x2c, 0x20, 0x27, 0x66, 0x6f, 0x72, 0x74, 0x72, 0x61, 0x6e, 0x5f, 0x6f, 0x72, 0x64, 0x65, 0x72, 0x27, 0x3a, 0x20, 0x46, 0x61, 0x6c, 0x73, 0x65, 0x2c, 0x20, 0x27, 0x73, 0x68, 0x61, 0x70, 0x65, 0x27, 0x3a, 0x20, 0x28, 0x33, 0x2c, 0x29, 0x2c, 0x20, 0x7d, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x0a, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1f, 0x85, 0xeb, 0x51, 0xb8, 0x1e, 0x09, 0x40, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc3, 0xf5, 0x28, 0x5c, 0x8f, 0xc2, 0x05, 0x40, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xec, 0x51, 0xb8, 0x1e, 0x85, 0xeb, 0xf9, 0x3f]
open("/tmp/temp.npy", "w") do f
    write(f, bytes)
end

Then i read the file the usual way...

npzread("/tmp/temp.npy")

and this is my stacktrace

parsing header failed: expected character ''', found '['

Stacktrace:
 [1] parsechar(::String, ::Char) at /home/janders/.julia/v0.6/NPZ/src/NPZ.jl:70
 [2] parsestring(::String) at /home/janders/.julia/v0.6/NPZ/src/NPZ.jl:76
 [3] parsedtype(::String) at /home/janders/.julia/v0.6/NPZ/src/NPZ.jl:123
 [4] parseheader(::String) at /home/janders/.julia/v0.6/NPZ/src/NPZ.jl:157
 [5] npzreadarray(::IOStream) at /home/janders/.julia/v0.6/NPZ/src/NPZ.jl:191
 [6] npzread(::String) at /home/janders/.julia/v0.6/NPZ/src/NPZ.jl:218

versioninfo():

Julia Version 0.6.1-pre.0
Commit dcf39a1 (2017-06-19 13:06 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

And my NPZ package has head at

* 60c4bc9 2017-02-01 (HEAD, tag: v0.2.0, origin/master, cache/heads/master, master) REQUIRE: downgrade Compat because we don't use Compat.view [Fazlul Shahriar]
fhs commented 6 years ago

numpy.record dtype is not yet supported. You can convert it to a normal array and save it:

df = pd.DataFrame({"i": [1, 2, 3],"j": [4, 5, 6],"k":[3.14, 2.72, 1.62]})
np.save("/tmp/temp.npy", df.values)