bitbu commented 3 years ago

I have an intel graphics card.

To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter: $ lspci | grep -i nvidia

The result is empty. Do you have suggestions for what parts of the code I can still run? Is running on AWS an option?

bitbu commented 3 years ago

I did set 'cpu': True in the notebook example but the full notebook example still doesn't run so wondering if there is another setting I need to change if I don't have the right graphics card

f-charton commented 3 years ago

You really need CUDA to use the GPU, so it is NVIDIA or nothing. If you don't have CUDA, you should set --cpu true. Normally, that it all there is to it. Please send the error messages.

Note that without a GPU data generation and model evaluation should be all right, but training will be hopelessly slow.

bitbu commented 3 years ago

Thank you for your response @f-charton

I am starting with beam_integration notebook. In it I set cpu True.

The first error message that I got was when running block 5 line: modules = build_modules(env, params) This was the message (easy to fix but including for completeness). line 41 of ~/Documents/SymbolicMathematics/src/model/init.py in build_modules(env, params) ---> 41 reloaded = torch.load(params.reload_model)

The error message had enough info to fix it so I have it as reloaded = torch.load(params.reload_model, map_location='cpu')

The next error message came up on Encode Input part of the notebook and I am not sure how to fix it:

RuntimeError Traceback (most recent call last)

in 6 ).view(-1, 1) 7 len1 = torch.LongTensor([len(x1)]) ----> 8 x1, len1 = to_cuda(x1, len1) 9 10 with torch.no_grad(): ~/Documents/SymbolicMathematics/src/utils.py in to_cuda(*args) 127 if not CUDA: 128 return args --> 129 return [None if x is None else x.cuda() for x in args] 130 131 ~/Documents/SymbolicMathematics/src/utils.py in (.0) 127 if not CUDA: 128 return args --> 129 return [None if x is None else x.cuda() for x in args] 130 131 ~/anaconda3/lib/python3.8/site-packages/torch/cuda/__init__.py in _lazy_init() 188 raise AssertionError( 189 "libcudart functions unavailable. It looks like you have a broken build?") --> 190 torch._C._cuda_init() 191 # Some of the queued calls may reentrantly call _lazy_init(); 192 # we need to just return without initializing in that case. RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCGeneral.cpp:47 ________________________________________________________ For those of us without NVIDIA should we be able to run through the whole notebook just slower or do we expect the encode input part to not work?

bitbu commented 3 years ago

commenting out

x1, len1 = to_cuda(x1, len1)

lets me run the whole notebook.

what does that line do? the outputs overwrite the inputs so I am not sure what the function does and if commenting it out is bad.

f-charton commented 3 years ago

The general idea is that GPU uses its own memory, which is distinct from the computer RAM. Copying from RAM to GPU is done by calling specific functions. This is what to_cuda() or the map_device parameter in torch.load() will do. If you run from cpu only, you want to deactivate those. For to_cuda() this is adressed by the CUDA parameter in utils.py.

The torch.load(params.reload_model, map_location='cpu') fix is correct. Ideally, you'd want to make this depend on params.cpu.

Commenting to_cuda() it is ok in this specific case, but the correct way to do it would be to set variable src.utils.CUDA to False, when params.cpu is set. In the python code, this is done in function main(), in trainer.py. You might want to copy these lines of code into the notebook. This is better because it will deactivate all further calls to to_cuda()

facebookresearch / SymbolicMathematics

graphics card #15

The next error message came up on Encode Input part of the notebook and I am not sure how to fix it:

x1, len1 = to_cuda(x1, len1)