IBM / snapml-docs

IBM Snap ML Documentation
Apache License 2.0
0 stars 3 forks source link

Graph Feature Preprocessor -- feature value meaning #30

Open 2g-XzenG opened 5 months ago

2g-XzenG commented 5 months ago

Hi there,

It is really a great work of SNAPML! I am looking at the paper and example code of Graph Feature Preprocessor, but I still do not fully understand the algorithm of the feature generation. for example: the value for the generated fan-in feature, how exactly is the value calculated?

If you could share more details of the Graph Feature Preprocessor that would be super helpful!! (would love to read the source code if possible) Or sharing with one example of how the value is derived would be super appreciated!

Thanks! Looking forward to your response.

jblanusa commented 5 months ago

Hi,

The most detailed explanation on how Graph Feature Preprocessor generates graph-based features is available in Sections 2.2 and 2.3 of our paper https://arxiv.org/pdf/2402.08593 and in the documentation https://snapml.readthedocs.io/en/latest/graph_preprocessor.html. In essence, whenever forward an edge throught transform function, you insert it into a graph and see if it creates one of the graph patterns (including fan-in, fan-out, cycle, etc.). Then, the library check how many patterns this edge creates with the given pattern size. Finally, it is encoded as shown in Figure 5. of paper https://arxiv.org/pdf/2402.08593. For example, in this figure, the first edge (purple -> yellow) belongs to 4 Scatter-gather patterns of size 3 and 2 temporal cycles with size >= 30.

We did not open-source the code of Graph Feature Preprocessor, but there is you can find its component that is performing cycle enumeration available as a open-sourced repo: https://github.com/IBM/parallel-cycle-enumeration.

I hope this helps.

Best, Jovan