Open chuongmep opened 5 months ago
Thanks for your issue report. Maybe pandas does not support the newest hdf5 file layout. I will check today evening :-)
Apart from that I think pandas uses a different HDF5 dataset layout (i.e. multidimensional dataset to represent a dataframe). Maybe you have more luck with H5py.
Thank you for your help, meaning the problem now is different version of hdf5 ? I'm just confused because I'm extract data from C# and read data from python
Just provide more information H5py it work well, just small problem with encoder string .
# read hdf5 file
filePath =r"file3.h5"
import h5py
f = h5py.File(filePath, 'r')
list(f.keys())
# get dataset inside group
dataset = f['Category']
# get member inside dataset
list(dataset.keys())
member = dataset['table']
# show dataframe
import pandas as pd
import numpy as np
arr = np.array(member)
print(arr.dtype)
# cast to string
df = pd.DataFrame(arr)
df
Id | Name | Address |
---|---|---|
1 | b'Hoang' | b'Hanoi' |
2 | b'Chuong' | b'Hanoi' |
3 | b'Huy' | b'Hanoi' |
4 | b'Hieu' | b'Hanoi' |
Cs Files :
[Test]
public void TestSaveDatatableToHdf()
{
string group = "Category";
DataTable dataTable = new DataTable();
dataTable.Columns.Add("Id", typeof(int));
dataTable.Columns.Add("Name", typeof(string));
dataTable.Columns.Add("Address", typeof(string));
dataTable.Rows.Add(1, "Hoang", "Hanoi");
dataTable.Rows.Add(2, "Chuong", "Hanoi");
dataTable.Rows.Add(3, "Huy", "Hanoi");
dataTable.Rows.Add(4, "Hieu", "Hanoi");
// Convert DataTable to array of TableRow
TableRow[] array = dataTable.AsEnumerable()
.Select(row => new TableRow
{
Id = row.Field<int>("Id"),
Name = row.Field<string>("Name"),
Address = row.Field<string>("Address")
})
.ToArray();
// Add to HDF using the compound data type
var file = new H5File()
{
[group] = new H5Group()
{
["table"] = array,
}
};
file.Write("file3.h5");
}
struct TableRow
{
public int Id;
public string Name;
public string Address;
}
This is the line where Pandas calls into pytables:
I debugged until that line of code and could see that pytables found the group named my-group
.
And here pytables filters out all groups that do not have the pandas_type
attribute set:
https://github.com/pandas-dev/pandas/blob/84aca21d06574b72c5c1da976dd76f7024336e20/pandas/io/pytables.py#L1505
When you create a very simple pandas HDF5 file like this
import numpy as np
import pandas as pd
hdf = pd.HDFStore('hdf_file.h5')
df = pd.DataFrame(np.random.rand(5,3))
hdf.put('test', df)
and then open that file in e.g. HDFView, you will see how Pandas stores the data internally (and also the attribute pandas_type
mentioned above):
So I think you need to create a HDF5 file with a group/attribute/dataset structure that Pandas expects.
@Apollo3zehn , do you have any c# example can help with that ?
@chuongmep I do not have any examples to create Pandas compatible HDF5 files via PureHDF. But as shown in my previous post you can easily create a test HDF5 file using Pandas and have a look into what Pandas expects to be present in that file. It does not look too difficult to mimic that format with PureHDF.
Alternatively you could read the data produced by PureHDF into Python via h5py
and then convert that dataset to a Pandas dataframe (I think this is what you did here: https://github.com/Apollo3zehn/PureHDF/issues/53#issuecomment-1893850100).
So to summarize: No, unfortunately, there is no "Pandas compatible" mode yet, but I will add it to my todo list.
Thank you for your help, that will useful for me !
When I'm try read from pandas python, it return nothing. Whether it relate to schema version of HDF5 ?
Thank you
This is testing in cs: