dice-group / dice-embeddings

Hardware-agnostic Framework for Large-scale Knowledge Graph Embeddings
MIT License
50 stars 14 forks source link

Memory issues when running k fold cross validation with batching on a large dataset #284

Closed sshivam95 closed 16 hours ago

sshivam95 commented 3 days ago

For the dataset having the following stats:

Number of entities:51180
Number of relations:1
Number of triples on train set:26111

with 1 NVIDIA A100-SXM4 of 40GB and 1 CPU core with 340 GB memory on Nocta 2 cluster, the 10 fold batched cross validation fails by raising a memory kill error signal. Here is the dataset file commons_page_links_es.txt. I am using the following command to run the training:

dicee --path_single_kg commons_page_links_es.txt --model Keci --num_epochs 200 --p 0 --q 1 --embedding_dim 256 --scoring_technique NegSample --batch_size 100_000 --optim Adopt --num_folds_for_cv 10

The batched evaluation works on small number of batches but it cannot handle large number of batches and gives a memory kill signal.

Demirrr commented 16 hours ago

fixed in #285