dmnfarrell / pandastable

Table analysis in Tkinter using pandas DataFrames.
Other
635 stars 124 forks source link

Support for dataframe import from HDF5 files #178

Closed fkromer closed 4 years ago

fkromer commented 4 years ago

I could not get any hits when searching for "hdf5" on https://pandastable.readthedocs.io . Are you supporting imports of pandas dataframes from HDF5 files already? In pandastable.core there are only the import related functions importCSV(), importExcel(), importURL(), load(filename, filetype=None) with .mpk or .pkl as filetype, load_msgpack() and load_pickle().

A bit about the motivation for this request: HDF5 is a format supported by lots of data science tools (Matlab, Keras, ...). Both, hdf5 and pickle persist type information (types of dataframe row indices, header labels, data entries and the like). In comparison pickle is limited to the Python language domain. CSV does not persist type information. As far as I know MessagePack does also not persist type information.

dmnfarrell commented 4 years ago

Yes I'm familiar with hdf5. I used to use messagepack for persisting project files but pandas dropped support for it. I can add hdf5 support quite easily I should think. Are you talking about dataexplore or the pandastable library.

fkromer commented 4 years ago

Cause I need a visualization solution for my coworkers I'm talking about dataexplorer. I'd be able to PR as well. However as I'm not able to implement and PR the next time it would be great if you could add support for it.

dmnfarrell commented 4 years ago

Ok I'll take a look.

fkromer commented 4 years ago

Great, thanks a lot!

dmnfarrell commented 4 years ago

Have added basic import to file menu. See if it works. You'll need the latest version from the repository though.

fkromer commented 4 years ago

I gave dataexplore a short try on Ubuntu running Python 3.8.2 (sudo apt-get install python3-tk, ~/github/pandastable$ pip3 install --user -e ., ~/github/pandastable$ dataexplore) with the probably most simple hdf5 file possible:

$ python3
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> import numpy as np
>>> with h5py.File("mytestfile.hdf5", "w") as f:
>>>    dset = f.create_dataset("mydataset", (100,), dtype='i')

dataexplore did not complain :+1: