Freezing - Githubissues

tcrosenk commented 7 years ago

I was wondering what the specs of the machine were when you ran this test. I was unable to complete the test (following the instructions exactly) because at about a quarter of the way through, my entire machine froze. This could be due to the fact that I am running a virtual machine, but I gave it over 11 GB of memory and 4 processors.

Thank you.

bxshi commented 7 years ago

Hi thank you for your interest! I think you could try lower the number of data generators, try

./ProjE_softmax.py --dim 200 --batch 200 --data ./data/FB15k/ --eval_per 1 --worker 1 --eval_batch 500 --max_iter 100 --generator 1

instead and see if it works fine. The reason is if you have more workers/generators than the number of processors you have, the model will mostly spending time generating data instead of doing the actual training.

By the way would you mind be more specific on when your run freezes? Like after how many iterations it fails or does it during training or testing phrase? Thanks!

I'm working on a similar project right now and after I publish that I'll release a new version of ProjE which should be more efficient. If lowering the number of workers does not work, you can either wait a bit for the new implementation or you can try do your own implementation. If you move all the data generation part into a TensorFlow input pipeline and create a customized negative sampling C++ ops, it will significantly increase the speed and lower the memory consumption.

tcrosenk commented 7 years ago

I am going to rerun the test twice, once how I originally did it (to show you the freezing) and the new way (to test to make sure it will finish).

I appreciate the fast and detailed response! Thank you. I will comment back here with my results!

bxshi commented 7 years ago

Sure thing!

On Apr 6, 2017, at 3:10 PM, tcrosenk notifications@github.com wrote:

I am going to rerun the test twice, once how I originally did it (to show you the freezing) and the new way (to test to make sure it will finish).

I appreciate the fast and detailed response! Thank you. I will comment back here with my results!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tcrosenk commented 7 years ago

freeze

Here is a screenshot of when my machine freezes. I am currently running the other option you gave me, it is still going but it has gotten past the point that it froze last time. I think it will finish. Thank you!

bxshi commented 7 years ago

Glad it works now!

I noticed that you may using the precompiled TensorFlow on your VM and it is not compiled with SSE support. This will dramatically slow the program down if you are using CPUs for the computation.

I would suggest build the TF on your VM with SSE supports if possible but this will take a long time and lots of memory though.

On Apr 6, 2017, at 5:58 PM, tcrosenk notifications@github.com wrote:

Here is a screenshot of when my machine freezes. I am currently running the other option you gave me, it is still going but it has gotten past the point that it froze last time. I think it will finish. Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tcrosenk commented 7 years ago

I did notice those warning messages, I may look into that because it is currently taking about an hour and 10 minutes for each iteration.

bxshi / ProjE

Freezing #5