getml / reflect-cpp

A C++20 library for fast serialization, deserialization and validation using reflection. Supports JSON, BSON, CBOR, flexbuffers, msgpack, TOML, XML, YAML / msgpack.org[C++20]
https://getml.github.io/reflect-cpp/
MIT License
901 stars 76 forks source link

Interface for tabular formats #28

Open liuzicheng1987 opened 8 months ago

liuzicheng1987 commented 8 months ago

In addition to the more complex formats, we would also like to support tabular formats like parquet or CSV. But currently, we don't even have an interface and concepts for that. These will have to be simplified version of our current parsing module.

SChakravorti21 commented 5 months ago

It would be very valuable to support reading Apache Arrow in-memory tables using reflect-cpp. Could you share an idea of what kind of work this would require?

As the Arrow library already has high-quality Parquet and CSV readers, we could get those for free, too.

liuzicheng1987 commented 5 months ago

Hi @SChakravorti21, I'm sorry, didn't see your comment until now.

Yes, I am familiar with Apache Arrow. Basically, you would have to check the Arrow Table schema against the schema of our C++ structs. Then you would have to go through the columns and read the fields into the structs.

The challenge here is that the way reflect-cpp works on structs implies "row-major order", in other words a vector of structs. But we want column-major order.