Open bitbu opened 3 years ago
I did set 'cpu': True in the notebook example but the full notebook example still doesn't run so wondering if there is another setting I need to change if I don't have the right graphics card
You really need CUDA to use the GPU, so it is NVIDIA or nothing. If you don't have CUDA, you should set --cpu true. Normally, that it all there is to it. Please send the error messages.
Note that without a GPU data generation and model evaluation should be all right, but training will be hopelessly slow.
Thank you for your response @f-charton
I am starting with beam_integration notebook. In it I set cpu True.
The first error message that I got was when running block 5 line: modules = build_modules(env, params) This was the message (easy to fix but including for completeness). line 41 of ~/Documents/SymbolicMathematics/src/model/init.py in build_modules(env, params) ---> 41 reloaded = torch.load(params.reload_model)
The error message had enough info to fix it so I have it as reloaded = torch.load(params.reload_model, map_location='cpu')
RuntimeError Traceback (most recent call last)
commenting out
lets me run the whole notebook.
what does that line do? the outputs overwrite the inputs so I am not sure what the function does and if commenting it out is bad.
The general idea is that GPU uses its own memory, which is distinct from the computer RAM. Copying from RAM to GPU is done by calling specific functions. This is what to_cuda() or the map_device parameter in torch.load() will do. If you run from cpu only, you want to deactivate those. For to_cuda() this is adressed by the CUDA parameter in utils.py.
The torch.load(params.reload_model, map_location='cpu') fix is correct. Ideally, you'd want to make this depend on params.cpu.
Commenting to_cuda() it is ok in this specific case, but the correct way to do it would be to set variable src.utils.CUDA to False, when params.cpu is set. In the python code, this is done in function main(), in trainer.py. You might want to copy these lines of code into the notebook. This is better because it will deactivate all further calls to to_cuda()
I have an intel graphics card.
To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter: $ lspci | grep -i nvidia
The result is empty. Do you have suggestions for what parts of the code I can still run? Is running on AWS an option?