benedekrozemberczki / karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
https://karateclub.readthedocs.io
GNU General Public License v3.0
2.16k stars 246 forks source link

How to improve the performance of Graph2Vec model fit function ? #76

Closed chamath-eka closed 3 years ago

chamath-eka commented 3 years ago

I tried to increase the performance of the Graph2Vec model by using increasing the worker parameter when initializing the model. But it seems that still, the model takes only 1 core to process the fit function.

Is method I have used to assign the workers correct ? Is there another method to improve the performance ?

model =  Graph2Vec(workers=28)
graphs_list=create_graph_list(graph_df)
model.fit(graphs_list)
graph_x = model.get_embedding()
benedekrozemberczki commented 3 years ago

Most of the time is spent with WL hashing and that part only uses a single core - more cores are used during the embedding phase. If you read the graphs from the disk you can use the approach that I have here:

https://github.com/benedekrozemberczki/graph2vec

Could you star the repos and hit follow on Github?

On Mon, 16 Aug 2021 at 12:29, Chamath Ekanayake @.***> wrote:

I tried to increase the performance of the Graph2Vec model by using increasing the worker parameter when initializing the model. But it seems that still, the model takes only 1 core to process the fit function.

Is method I have used to assign the workers correct ? Is there another method to improve the performance ?

model = Graph2Vec(workers=28)graphs_list=create_graph_list(graph_df)model.fit(graphs_list)graph_x = model.get_embedding()

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benedekrozemberczki/karateclub/issues/76, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEETMF4KJ5NYHBA6YTEFRILT5DZCDANCNFSM5CHROMFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

chamath-eka commented 3 years ago

@benedekrozemberczki Is it possible to optimizes the memory usage when training a graph embedding model like batch processing? This is an option or workaround to do that, I find it hard to scale for large datasets.
I tried to break a small dataset into smaller parts and try multiple fits but the resulting embeddings are quite different.

bdeng3 commented 2 years ago

I had the same problem here. I also tried to use Graph2Vec to fit a large graph dataset, and it turns out I can't fit a list of all graph into RAM. Batch processing will definitely help, but I'm not sure if it is available in the framework now.

May I ask how do you solve this problem eventually?

chamath-eka commented 2 years ago

In the end, I went with the FEATHER method since the FEATHER is more scalable than Graph2Vec and it is not transductive.

benedekrozemberczki commented 2 years ago

FEATHER scales well, it is indeed inductive and more expressive. It is SOTA on this dataset too: https://openreview.net/forum?id=1xDTDk3XPW (which is extremely cool considering the dataset size).

benedekrozemberczki commented 2 years ago

@1209973 The newest release has an inference functionality.