How to include textual data as node attributes

hansugu commented 4 years ago

Say if a node has multiple aspects of textual descriptions, one way is to put them as multiple attributes for a node separated using delimiters like a colon. E.g. id:int64 attribute:string 10001 the color is blue:round shape:it's very nice and expensive

However, if the text itself contains a colon, the split would break. What's the best way to input multiple text attributes to graph-learn? Separate then by "\t" in a line would break the code. What about putting text attributes into multiple node files (one file has one attribute)? Would that be supported?

I understand text needs to be further encoded by custom encoders, which I plan to implement.

Seventeen17 commented 4 years ago

Good issue, multiple text attributes do need to be further supported. But I think what you mentioned can be fixed by using another more complex delimiter like "&&&". Multiple node files for the same one node type may results in storage redesigned. Maybe feature column is what we should support, that one feature has indefinite number of columns.

hansugu commented 4 years ago

I think it makes sense to assume users will preprocess text to filter out delimiters such as "&&&", or maybe "\t". Unfortunately for now neither is supported as a delimiter to split attributes. It would be great if this can be added. Thanks :)

Seventeen17 commented 4 years ago

User-defined delimiter for attributes can be set in Decoder for each data source. https://github.com/alibaba/graph-learn/blob/1d024e762128ee2bad96e370312090afd409a7f9/graphlearn/python/decoder.py#L33

alibaba / graph-learn

How to include textual data as node attributes #28