KxSystems / hdf5

A library for converting between kdb+ and HDF5 data
https://code.kx.com/q/interfaces
Apache License 2.0
13 stars 8 forks source link

Cannot open tables with pandas read_hdf #55

Open antipisa opened 1 month ago

antipisa commented 1 month ago

Pandas read_hdf cannot parse tables that have been saved as hdf5 via kdb even though they are the same format.

pandas version: 2.2.1
kdb version: 4.0 2023.01.20
python: 3.11.8

// Create a kdb+ table and write this to a dataset in the appropriate group
q)N:10000
q)5#test_data:([]tstamp:asc N?0p;voltage:N?1f;volume:N?100;class:N?10h;on_off:N?0b)
tstamp                        voltage   volume class on_off
-----------------------------------------------------------
2000.01.01D01:03:27.925513386 0.1458085 85     0     0     
2000.01.01D05:46:27.469021975 0.4981235 99     9     0     
2000.01.01D06:14:44.435858577 0.1976848 75     8     0     
2000.01.01D09:38:22.896863222 0.2246491 63     4     0     
2000.01.01D11:13:14.535212963 0.82371   45     1     0     

// Naming of the file and kdb table to be written to HDF5
\d .hdf5
fname:"test.h5";
createFile[fname];
write_name:"data";
writeData[fname;write_name;test_data];

Then from python

pd.read_hdf("test.h5", "data")

raises
TypeError: cannot create a storer if the object is not existing nor a value are passed
antipisa commented 1 month ago

?

cmccarthy1 commented 1 month ago

Hi @antipisa,

This isn't a hdf5 interface issue on the KX side specifically, there's a very similar issue that's being highlighted in articles similar to the following https://foongminwong.medium.com/load-visualize-hdf5-in-python-1ad5dcaa20a9. It appears that h5py is the suggested method for reading generic hdf5 files rather than pandas (and converting them to Pandas).

The specific metadata that's needed to support Pandas loading could be added to the repository but that specificity seem more specific to Pandas and it's use of hdf5 than to the format itself.

All the best,

Conor

antipisa commented 1 month ago

@cmccarthy1 Could you show using this example how to read the table written to hdf5 from kdb into pandas via h5py? I have not had success with this either.

cmccarthy1 commented 1 month ago

I'll look at doing that or having one of the team do so yeah, have been looking to reproduce but it's been a few months since I've built the repository so may be later in the week before I get back to it I expect.