RAM Issue with Cleora on Large Graphs

nik3211 commented 3 years ago

Hello everyone,

I am working on a graph consisting of 300M+ nodes and 1.4B+ relationships. I tried training embeddings using cleora on this graph but facing a shortage of RAM issue. The server I am trying to run this on contains 661GB of RAM.

Is there any way where I don't load the entire files in the memory and just train it, based on a batch size to keep the memory used steadily and relatively lower?

piobab commented 3 years ago

Hi @nik3211!

Currently, Cleora keeps the sparse matrices in memory. Take a look on https://github.com/Synerise/cleora#memory-consumption so you can calculate how much memory you need for sparse matrices and training. I made an assumption that you have ONE sparse matrix (a graph consisting of 300M+ nodes and 1.4B+ relationships), which consumes ~80GB of RAM. The training depends on dimension of the embedding. If the total memory needed exceed your RAM try to use --in-memory-embedding-calculation 0 where the training is supported by memory-mapped-files.

According to your question. There is no any batch size configuration right now but you can split your input file manually, train and then make averaging of your embeddings.

piobab commented 3 years ago

Hi @nik3211!

Any thoughts? Let me know how we can help :)

BaseModelAI / cleora

RAM Issue with Cleora on Large Graphs #25