apache / age

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.
https://age.apache.org
Apache License 2.0
3.11k stars 410 forks source link

[Bitnine Tech] - Change edge file format #375

Open Hyundong-Seo opened 1 year ago

Hyundong-Seo commented 1 year ago

When I try to load a csv file for edge, I should write a format like this for only AGE. start_id,start_vertex_type,end_id,end_vertex_type,properties

But I think that it wastes about data and it isn't reasonable, because i should modify all files that were already written It seems that they have to exist for only AGE and it is not suitable about reusing a data file.

I suggest a method that it changes just header not the other rows like this. And we write a label name with other delimiter like :(the colon). [present]

commentid,comment,personid,person
1236950581249,comment,10995116284808,person
1236950581250,comment,4139,person
2061584302085,comment,4139,person
2061584302086,comment,4139,person
2061584302087,comment,4139,person
2061584302088,comment,10995116284808,person
2061584302089,comment,32985348838375,person
2061584302090,comment,10995116284808,person
2061584302091,comment,6597069777240,person
...

[suggestion]

commentid:comment,personid:person
1236950581249,10995116284808
1236950581250,4139
2061584302085,4139
2061584302086,4139
2061584302087,4139
2061584302088,10995116284808
2061584302089,32985348838375
2061584302090,10995116284808
2061584302091,6597069777240
...
jrgemignani commented 1 year ago

By definition, csv means a comma separated values file. I don't think it is wise to add in another delimiter.

If my understanding of the load is correct, all of the information needed to create a graph component (vertex or edge) is contained within the single line for that component. This precludes having it just in the header.

If I understand what you are suggesting, what would happen if there were many labels in that csv file?

commentid:comment3,personid:person
1236950581249,10995116284808
1236950581250,4139
2061584302085,4139
commentid:comment2,personid:person
2061584302086,4139
2061584302087,4139
commentid:comment2,personid:person2
2061584302088,10995116284808
2061584302089,32985348838375
commentid:comment1,personid:person3
2061584302090,10995116284808
2061584302091,6597069777240

This would make creating and parsing that file a real problem, without any real benefit.

jrgemignani commented 1 year ago

Does this help?