Open ct-clmsn opened 7 years ago
Very nice, thanks! @hapoo this might be something for you!
@hkaiser @hapoo glad to hear it will work as a starting point for future conversations.
From Phylanx project mailing list (edited, condensed):
Permalink From: @ct-clmsn To: Phylanx Project Subject: Data Formats
There are a couple additional formats the team may want to consider or look into supporting.
- LMDB - Caffe
- Tensorflow records
- Dataframes
On the topic of dataframe... This would be some type or version of dataframe. Not all of the capabilities but, maybe some minimized set of features (column select, filter, join). Some of these capabilities could be abstracted and applied to LMDB and TF records. This could motivate work on HPX algorithms.
Permalink From: @hkaiser
Thanks @ct-clmsn! @hapoo, do you copy?
Permalink From: @ct-clmsn
Also this library might be useful for dataframe baseline functionality.
It is a GPU dataframe package.
Permalink From: @hapoo
Roger that, I looked at caffe's LMDB, it seems very interesting and has capabilities similar and better than HDF5 for deep learning. I will look at the rest as well.
Permalink From: @ct-clmsn
Thanks! Would a 'table' abstraction between these formats simplify processing?
Provide customized file readers for the following formats to assist with algorithm debugging. This is an arbitrary ordering.
Review these formats and assess the level of effort and potential prioritization. Note, some might not be worth the effort implementing. May need to contact some projects to assess partnership opportunities.