Closed DJRavinszkha closed 3 years ago
Can you try adding --rel_part
After running it for approximately an hour (with log_interval set to 100) This is the output I have thus far.
And the following values indicate GPU temp, power, utilisation etc
Do these seem to make sense to you?
How large is your graph?
~10million triples, ~73000 Entities, 42 relations
Can you try run it with single GPU first? BTW, for rotatE, you need to add -de
For multigpu cmd, you can refer to https://github.com/awslabs/dgl-ke/blob/master/examples/wikikg2/multi_gpu.sh.
You are such a life saver!!! I didn't realise i was missing the -de.
It works!!!!!!
I will keep this open for some time, in case I run into any other issues, but for now I think the (first) issue (so far) has been solved.
Thank you so much for your quick and concise help!
Hello Again,
I have been attempting to run the dglke_eval and dglke_predict functions on my pretrained model, however am encountering the following errors:
dglke_eval:
dglke_predict:
As you can see I have pytorch version 1.6 installed, as I could see this was the best version for dglke.
In addition to this, I am using the entities.tsv and relations.tsv initialised by dglke_train and my own list of heads, relations and tails. After running dglke_train successfully, the following output was saved:
and I expect the result of dglke_eval to be the same to this value.
Thanks for your help in advance!
I am kind of new to this, and have been attempting to use pykeen to compute KG embeddings. Nevertheless, it was taking a long time and I realised that running this on multi-GPU is better, and as such have switched to DGL-KE. I have a graph of 10 million edges, on which I am trying to predict new links with RotatE and TransR. So far I have been using the following:
yet the p3.8xlarge AWS EC2 instance shows that I am utilising the GPU's in the following manner.
Thu Jun 10 07:02:22 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.119.03 Driver Version: 450.119.03 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:1B.0 Off | 0 | | N/A 47C P0 57W / 300W | 16056MiB / 16160MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:00:1C.0 Off | 0 | | N/A 47C P0 56W / 300W | 1350MiB / 16160MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:00:1D.0 Off | 0 | | N/A 45C P0 60W / 300W | 1350MiB / 16160MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 | | N/A 48C P0 58W / 300W | 1350MiB / 16160MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 6552 C ...c2-user/pyenv/bin/python3 16053MiB | | 1 N/A N/A 37264 C /usr/bin/python3 1347MiB | | 2 N/A N/A 37298 C /usr/bin/python3 1347MiB | | 3 N/A N/A 37334 C /usr/bin/python3 1347MiB | +-----------------------------------------------------------------------------+
Is this correct, or is there a way to improve the way I am utilising the GPU's?
Finally, I am trying to replicate the work conducted in this study, in which they predicted new links in a KG of 15million edges in under 40 minutes with a p3.16xlarge instance. Given that my KG is 10 million edges and I am using a p3.8xlarge instance, how much longer would it take for me to run these computations? My data can be found at this zenodo link, and code at this github link (simply copy and paste code above to replace the pipeline chunks and you may have to install dgl-ke).
!sudo pip3 install torch
!sudo pip3 install dgl==0.4.3
!sudo pip3 install dglke
Thanks for your help!