Closed nik3211 closed 3 years ago
Hi @nik3211!
Currently, Cleora keeps the sparse matrices in memory. Take a look on https://github.com/Synerise/cleora#memory-consumption so you can calculate how much memory you need for sparse matrices and training.
I made an assumption that you have ONE sparse matrix (a graph consisting of 300M+ nodes and 1.4B+ relationships), which consumes ~80GB of RAM. The training depends on dimension of the embedding. If the total memory needed exceed your RAM try to use --in-memory-embedding-calculation 0
where the training is supported by memory-mapped-files.
According to your question. There is no any batch size configuration right now but you can split your input file manually, train and then make averaging of your embeddings.
Hi @nik3211!
Any thoughts? Let me know how we can help :)
Hello everyone,
I am working on a graph consisting of 300M+ nodes and 1.4B+ relationships. I tried training embeddings using cleora on this graph but facing a shortage of RAM issue. The server I am trying to run this on contains 661GB of RAM.
Is there any way where I don't load the entire files in the memory and just train it, based on a batch size to keep the memory used steadily and relatively lower?