Closed TheMrguiller closed 1 year ago
It looks like some error related to your GPU (maybe CUDA install). Given that you are running it on MS Windows, my first guess would be the OS-related issues. Could you try the run with --no-cuda
option to make sure that your command runs? This will be very slow as you only run your training on CPU, but at least we make sure that everything else is correct.
If so, maybe try your experiment on a Linux machine, or use our Docker image.
The command doesnt work, it gives an error realted to NfGramRepeatBlock
(parlai) PS C:\Users\superserver\Desktop\guillermo\Parlai> parlai safe_interactive -t blended_skill_talk -mf zoo:blender/blender_90M/model --no-cuda
15:51:55 | Unable to load ngram blocking on GPU: Error building extension 'ngram_repeat_block_cuda': [1/1] "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64/link.exe" ngram_repeat_block_cuda.o ngram_repeat_block_cuda_kernel.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\superserver\Desktop\guillermo\Parlai\parlai\Scripts\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\lib\x64" cudart.lib /out:ngram_repeat_block_cuda.pyd
FAILED: ngram_repeat_block_cuda.pyd
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64/link.exe" ngram_repeat_block_cuda.o ngram_repeat_block_cuda_kernel.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\superserver\Desktop\guillermo\Parlai\parlai\Scripts\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\lib\x64" cudart.lib /out:ngram_repeat_block_cuda.pyd
LINK : fatal error LNK1104: no se puede abrir el archivo 'python38.lib'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "c:\users\superserver\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\superserver\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\Scripts\parlai.exe\__main__.py", line 7, in <module>
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\__main__.py", line 14, in main
superscript_main()
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\core\script.py", line 247, in superscript_main
setup_script_registry()
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\core\script.py", line 37, in setup_script_registry
importlib.import_module(module.name)
File "c:\users\superserver\appdata\local\programs\python\python38\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\scripts\detect_offensive_language.py", line 19, in <module>
from parlai.utils.safety import OffensiveStringMatcher, OffensiveLanguageClassifier
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\utils\safety.py", line 10, in <module>
from parlai.agents.transformer.transformer import TransformerClassifierAgent
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\agents\transformer\transformer.py", line 15, in <module>
from parlai.core.torch_generator_agent import TorchGeneratorAgent
File "C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\core\torch_generator_agent.py", line 48, in <module>
from parlai.ops.ngram_repeat_block import NfGramRepeatBlock
ImportError: cannot import name 'NfGramRepeatBlock' from 'parlai.ops.ngram_repeat_block' (C:\Users\superserver\Desktop\guillermo\Parlai\parlai\lib\site-packages\parlai\ops\ngram_repeat_block.py)
What pytorch version are you using? This is a known issue
I am currently using the last version of pytorch.
But ParlAI has certain requirements for the PyTorch version: see this.
I will check it, but i did try it by using the requirements.txt. If the problem continues you recomend using docker? I have another question related to the model once is it trained. Can it be saved in another format to use it for example to upload it to huggingface or to just use it with tensorflow?
we do not offer alternative model saving formats at the moment
are you on the latest version of ParlAI?
This is the proper fix, it may not be in the latest parlai release: https://github.com/facebookresearch/ParlAI/pull/4887
I think if you were to downgrade your pytorch to < 1.13 it would also solve the issue
This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.
Bug description Once I have trained my model from the zoo:tutorial_transformer_generator/model, I wnated to check how the model was performing. To do so, I use the following command line:
Reproduction steps
Every time I perform these command in different task i get the same error. I thought at first it was because of my task. But it seems its not. I have trained the model using the following command:
Expected behavior I expected to have some kind of output. If I do it without the
--skip-generation false
, it works but i dont get a model response.Logs Please paste the command line output:
Additional context Its my first training so I think I may have done something wrong.