Closed johnnytam100 closed 2 years ago
what does model.fit()
do under the hood?
if it is running SGD (stochastic gradient descent) on batches of graphs, you don't need to load all graphs into memory at once. you just need to load batch by batch into memory. but this will require you to modify the code of model.fit()
Hi @johnnytam100 I had a quick look at the FeatherGraph model. It doesn't appear there are any learnable params from a cursory glance so I think you can simply load batches of your graphs into memory and compute the embeddings. You can probably parallelize it too for greater speed.
You can maybe find some inspiration from the ProteinGraphDataset
class which does this (but only for the PyTorch ecosystem):
Thank you so much for the advices!!!! 🙇🏻♂️
Hi @a-r-j and thanks for your help as always! I am trying to load ~300,000 graphein protein graphs by
pickle.load
(then domodel.fit()
with karate club), like thisHowever, the whole thing doesn't fit into memory. Do you know a smarter way that can bridge such a huge number of protein graphs to machine learning models? I am grateful if you can share some hints. Thank you!