Open enricopal opened 7 years ago
Thank you for your post!
I solved additional problems. I will send a pull request soon.
Thank you! Ha-neul
I have the same problem.Had it solved?
I have the same problem. Facing continuous OOM with Node2vec for a directed graph. What is the recommendation to address this please?
Hi, I'm trying to run node2vec using the Spark implementation on a large graph (~2.8M nodes, ~41M edges, 4.1GB file), this is the command that I'm running:
./spark-submit --class com.navercorp.Main node2vec/node2vec_spark/target/node2vec-0.0.1-SNAPSHOT.jar --cmd node2vec --p 1 --q 1 --walkLength 40 --numWalks 5 --input yago_types.edgelist --output output/yago_types_p1_q1_l40_num5.emb --weighted False --directed False --indexed False
I get this error: "2017-05-09T16:45:13.259237677Z 17/05/09 16:45:13 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on spark-worker1-97711-prod: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages"
Everything was working fine with a smaller sample, so it seems like a memory problem to me. Have you ever experienced anything similar? Any clue on what could be a proper memory allocation for such a size of a graph? At the moment, I have a master node with 2GB and six workers with 42GB.
Thank you a lot! Enrico