apache / incubator-graphar

An open source, standard data file format for graph data storage and retrieval.
https://graphar.apache.org/
Apache License 2.0
225 stars 46 forks source link

[Feat][C++] Add a `WriterOption` to allow user to configure writer option like compression #108

Open acezen opened 1 year ago

acezen commented 1 year ago

Is your feature request related to a problem? Please describe. Currently the GraphAr C++ library supports to write chunks in different file formats (CSV, Parquet and ORC) with Arrow's internal file-format supports. Arrow provides writer options for file formats to configure options such as the compression type. But GraphAr only uses the default options to write: CSV: https://github.com/alibaba/GraphAr/blob/ad30121070c9dc115ac916ef620de29e2097af77/src/filesystem.cc#L205-L210 Parquet: https://github.com/alibaba/GraphAr/blob/ad30121070c9dc115ac916ef620de29e2097af77/src/filesystem.cc#L216-L220 ORC: https://github.com/alibaba/GraphAr/blob/ad30121070c9dc115ac916ef620de29e2097af77/src/filesystem.cc#L224-L225

Consider to add a GraphAr WriterOption to allow users to configure the writer option.

Describe the solution you'd like Implement a WriterOption like:

class WriterOption {
   class builder {
          inline builder* compression(CompressionType);
          inline std::shared_ptr<WriterOption> build();
   }
}

and when write chunks with GraphAr, use:

WriterOption::builder builder;
builder.compression(CompressionType::ZSTD);
auto writer_option = builder.build()
auto writer = VertexChunkWriter(vertex_info, prefix, writer_option)

As a first issue, we can only consider to support the compression settings.

Additional context

75

acezen commented 1 year ago

cc/ @lixueclaire