STEllAR-GROUP / phylanx

An Asynchronous Distributed C++ Array Processing Toolkit
Boost Software License 1.0
75 stars 76 forks source link

Parse Popular Dataset File Formats #68

Open ct-clmsn opened 7 years ago

ct-clmsn commented 7 years ago

Provide customized file readers for the following formats to assist with algorithm debugging. This is an arbitrary ordering.

Review these formats and assess the level of effort and potential prioritization. Note, some might not be worth the effort implementing. May need to contact some projects to assess partnership opportunities.

hkaiser commented 7 years ago

Very nice, thanks! @hapoo this might be something for you!

ct-clmsn commented 7 years ago

@hkaiser @hapoo glad to hear it will work as a starting point for future conversations.

parsa commented 6 years ago

From Phylanx project mailing list (edited, condensed):

Permalink From: @ct-clmsn To: Phylanx Project Subject: Data Formats

There are a couple additional formats the team may want to consider or look into supporting.

On the topic of dataframe... This would be some type or version of dataframe. Not all of the capabilities but, maybe some minimized set of features (column select, filter, join). Some of these capabilities could be abstracted and applied to LMDB and TF records. This could motivate work on HPX algorithms.


Permalink From: @hkaiser

Thanks @ct-clmsn! @hapoo, do you copy?


Permalink From: @ct-clmsn

Also this library might be useful for dataframe baseline functionality.

It is a GPU dataframe package.


Permalink From: @hapoo

Roger that, I looked at caffe's LMDB, it seems very interesting and has capabilities similar and better than HDF5 for deep learning. I will look at the rest as well.


Permalink From: @ct-clmsn

Thanks! Would a 'table' abstraction between these formats simplify processing?