divideconcept / fastnumpyio

Fast Numpy I/O : Fast replacement for numpy.load and numpy.save
MIT License
36 stars 2 forks source link

Load function not working #4

Open weidinger-c opened 2 months ago

weidinger-c commented 2 months ago

Hi, I wanted to try out your save and load functions, but got this error with your load function:

save("/media/w/numpy_array_fast.npy", numpy_array)
load("/media/w/numpy_array_fast.npy")
Exception has occurred: ValueError
invalid literal for int() with base 10: '                                             '
  File "/workspaces/tile_db/tiledb/tiledb.py", line 29, in <genexpr>
    shape = tuple(int(num) for num in str(header[60:120], 'utf-8').replace(', }', '').replace('(', '').replace(')', '').split(','))
  File "/workspaces/tile_db/tiledb/tiledb.py", line 29, in load
    shape = tuple(int(num) for num in str(header[60:120], 'utf-8').replace(', }', '').replace('(', '').replace(')', '').split(','))
  File "/workspaces/tile_db/tiledb/tiledb.py", line 332, in import_las_file
    load("/media/w/numpy_array_fast.npy")
  File "/workspaces/tile_db/samples/create_db.py", line 25, in <module>
    db_accessor.import_las_file(las_filepath)
ValueError: invalid literal for int() with base 10: '      

I am using numpy 2.0.0, if that helps.

Thanks.

mahynski commented 2 months ago

Just submitted PR which fixed this error for me. If you don't want to wait, just replace line 26 with:

shape = tuple(int(num) for num in str(header[60:120], 'utf-8').strip().replace(', }', '').replace('(', '').replace(')', '').split(',') if num != '')

weidinger-c commented 1 month ago

Thanks, I tried your code, but unfortunately the loaded array is not the same as the original one. Could this be due to using "structured arrays" with different data types?

Here a short snipped how I create my random test data: image

mahynski commented 1 month ago

Yes that appears to be the case. I tried this out and the issue is coming from the line in the load() function:

descr = str(header[19:25], "utf-8").replace("'", "").replace(" ", "")

This is specific logic that extracts the type of the array when it is something simple, but doesn't work in your case. A more robust approach is needed there, akin to "numpy.lib.format.read_array_header_2_0". Also, the total size of the data is calculated later with

datasize = np.lib.format.descr_to_dtype(descr).itemsize

which doesn't seem to work when you have a structured array since it cannot easily parse out the number and size of the different types in your array. I do not see a simple fix, but it should be possible. For now, I think the code should work if you create 4 separate arrays and save them individually. Not as elegant, unfortunately.

weidinger-c commented 1 month ago

Thanks for the reply. I guess I'll create a new issue for support of strucured arrays. But it seems from your reply, that this is not as simple as one would guess...