facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.81k stars 561 forks source link

RuntimeError: Input type (struct c10::BFloat16) and bias type (float) should be the same #133

Closed maria-mh07 closed 12 months ago

maria-mh07 commented 12 months ago

I´m in windows 10, Python 3.10.6, torch 2.0.1+cu117, but GPU VRAM is too small. I ran: pip install nougat-ocr nougat inputs/test_file.pdf -o outputs

Error:

WARNING:root:GPU VRAM is too small. Computing on CPU.
C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3484.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                                                                                                            | 0/1 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main     
    return _run_code(code, main_globals, None,
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\Scripts\nougat.exe\__main__.py", line 7, in <module>
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\predict.py", line 157, in main    
    model_output = model.inference(
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\nougat\model.py", line 579, in inference
    last_hidden_state = self.encoder(image_tensors)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\nougat\model.py", line 121, in forward
    x = self.model.patch_embed(x)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\layers\patch_embed.py", line 35, in forward
    x = self.proj(x)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\maria\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (struct c10::BFloat16) and bias type (float) should be the same

The error occurs in the "forward" method of the "SwinEncoder" class. How do I solve it?

maria-mh07 commented 12 months ago

Solved! In the "model.py" file, on line 575, I changed if self.device.type != "mps:" for if self.device.type != "mps" and self.device.type != "cpu": Sorry for the inconvenience.

lukas-blecher commented 12 months ago

No you found a bug in an edgecase (cuda installed but cpu used). Will push a general solution in a bit