apache / incubator-graphar

An open source, standard data file format for graph data storage and retrieval.
https://graphar.apache.org/
Apache License 2.0
190 stars 40 forks source link

OSPP24 Idea: Implement ETL CLI Tools for GraphAr #463

Open acezen opened 2 months ago

acezen commented 2 months ago

Describe the enhancement requested

Description

GraphAr is designed as a unified storage format for graph data, aiming to provide a standardized graph data storage format for easy import/export, as well as exchange and sharing of graph data.Beyond the foundational format design, GraphAr currently also offers libraries in C++, Java, Python, and Scala to enable users to work with GraphAr formatted data across different programming environments.

To facilitate the use of GraphAr formatted data, we aim to provide a command-line tool based on these libraries. This tool will be used for converting data from various sources into GraphAr formatted data and vice versa - transforming GraphAr formatted data into other formats.

This command-line tool needs to support the following features:

Deliverables

  1. A CLI tool that meets the above requirements
  2. Detailed design and usage documentation

Component(s)

Other

Reference

acezen commented 1 month ago

parquet-cli is good reference for CLI: https://github.com/apache/parquet-mr/tree/master/parquet-cli

ywh555hhh commented 1 month ago

@acezen hi,I would like to ask if you have time to check and reply to my email about ospp,please

I sent it to the qiaozi.zwb@alibaba-inc.com