Closed hansugu closed 4 years ago
Good issue, multiple text attributes do need to be further supported. But I think what you mentioned can be fixed by using another more complex delimiter like "&&&". Multiple node files for the same one node type may results in storage redesigned. Maybe feature column is what we should support, that one feature has indefinite number of columns.
I think it makes sense to assume users will preprocess text to filter out delimiters such as "&&&", or maybe "\t". Unfortunately for now neither is supported as a delimiter to split attributes. It would be great if this can be added. Thanks :)
User-defined delimiter for attributes can be set in Decoder
for each data source.
https://github.com/alibaba/graph-learn/blob/1d024e762128ee2bad96e370312090afd409a7f9/graphlearn/python/decoder.py#L33
Say if a node has multiple aspects of textual descriptions, one way is to put them as multiple attributes for a node separated using delimiters like a colon. E.g. id:int64 attribute:string 10001 the color is blue:round shape:it's very nice and expensive
However, if the text itself contains a colon, the split would break. What's the best way to input multiple text attributes to graph-learn? Separate then by "\t" in a line would break the code. What about putting text attributes into multiple node files (one file has one attribute)? Would that be supported?
I understand text needs to be further encoded by custom encoders, which I plan to implement.