Closed arshiya-singh closed 6 years ago
My implementation uses joblib
for the parallel execution...
If you try it with workers=4
, does it generate the same problem?
It does seem like the information passed to the parallel processes is over the limit of the implementation.
Could you try editing the following and see if this fixes it:
In the package code you'll find the node2vec.py
file, on line 133 (assuming the latest version) you will see walk_results = Parallel(n_jobs=self.workers ....)
Try to change it to walk_results = Parallel(n_jobs=self.workers, max_nbytes=None ....)
and run it again.
If this solves the problem I'd update it and upload another version
I set the workers to 4, changed the line, and it still returns the same error.
Does it work with workers=1
?
Yes, it succeeded with workers = 1
.
So the problem must be with the parallel execution, I guess it is too much nodes.
I won't have anytime soon to deal with it, but it could be because the graph gets sent to each of the working processes it results in too much memory needed for each process so using some kind of shared memory for reading only might help.
You are more than encouraged to try and solve it and make a pull request so the solution will be available for everybody
It looks like you ran into this bug in multiprocessing: https://bugs.python.org/issue17560
joblib depends on multiprocessing and hence will have problems when you try to send more than the integer maximum. Shared memory tends to be better in these cases for memory use but YMMV.
Recently used this to detect communities on a bipartite graph (5220 nodes 7136 edges). Currently using this for the same task on a co-occurrence graph with 5131 nodes and over 565k+ edges. The script was able to generate the transitional probabilities but after that it stops and returns this:
RuntimeError: The task could not be sent to workers as it is too large for 'send_bytes'
. Here's the full screenshot:Here's my python script:
Could it be that there are just "too many edges" for node2vec to handle? I'm not sure how else to fix this issue.