fhs / NPZ.jl

A Julia package that provides support for reading and writing Numpy .npy and .npz files
Other
117 stars 16 forks source link

Incorrect handling of Unicode keys when creating npz files #49

Open cerisola opened 2 years ago

cerisola commented 2 years ago

Hi, I am running into issues when using NPZ to create an npz file that uses unicode strings as keys.

Just to be clear, everything works fine when creating the file using Numpy and reading it using NPZ, i.e. this works fine in Python

>>> import numpy as np

>>> np.savez("file.npz", α=1)

>>> D = np.load("file.npz")

>>> print(D["α"])
1

and reading the file in Julia using NPZ also works as expected

julia> using NPZ

julia> D = npzread("file.npz")
Dict{String, Int64} with 1 entry:
  "α" => 1

julia> D["α"]
1

However, if I try creating this file from NPZ, while NPZ can read it as expected, it cannot be properly read by Numpy. Indeed, from the NPZ side:

julia> npzwrite("file.npz", Dict("α" => 1))

julia> D = npzread("file.npz")
Dict{String, Int64} with 1 entry:
  "α" => 1

julia> D["α"]
1

everything works fine. However, when I try opening the file with Numpy, while it does load it, the keys are not what I would expect:

>>> D = np.load("file.npz")

>>> print(D["α"])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-17-7d756a0b03cf> in <module>
----> 1 print(D["α"])

/usr/lib/python3.9/site-packages/numpy/lib/npyio.py in __getitem__(self, key)
    258                 return self.zip.read(key)
    259         else:
--> 260             raise KeyError("%s is not a file in the archive" % key)
    261 
    262 

KeyError: 'α is not a file in the archive'

Indeed if I print the keys of the loaded file I get some different unicode string:

>>> list(D.keys())
['╬▒']
cerisola commented 2 years ago

After digging into the source of the library to try to find the cause of this issue, I am now pretty sure the problem lies within the ZipFile.jl library that NPZ.jl uses to create the zip file. I have now created an issue for the ZipFile.jl project (see fhs/ZipFile.jl#84) to address this problem.