facebookresearch / CodeGen

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.
MIT License
710 stars 144 forks source link

Could any one help me with this error, failed to learn bpe. #51

Closed dinaalaaahmed closed 2 years ago

dinaalaaahmed commented 2 years ago

When I run the command

python -m codegen_sources.preprocessing.preprocess /home/dina/CodeGen/data/test_dataset --langs java cpp python --mode monolingual_functions --bpe_mode=fast --local=True --train_splits=1 ####### Error ######## INFO - 11/18/21 08:01:43 - 0:01:18 - training bpe on /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb... Traceback (most recent call last): File "/home/dina/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/dina/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/dina/CodeGen/codegen_sources/preprocessing/preprocess.py", line 214, in preprocess(args) File "/home/dina/CodeGen/codegen_sources/preprocessing/preprocess.py", line 102, in preprocess dataset.learn_bpe(ncodes=args.ncodes, executor=cluster_train_bpe) File "/home/dina/CodeGen/codegen_sources/preprocessing/dataset_modes/dataset_mode.py", line 589, in learn_bpe self._learn_bpe(ncodes, executor) File "/home/dina/CodeGen/codegen_sources/preprocessing/dataset_modes/monolingual_functions_mode.py", line 123, in _learn_bpe job.result() File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/core.py", line 263, in result r = self.results() File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/core.py", line 291, in results raise job_exception # pylint: disable=raising-bad-type submitit.core.utils.FailedJobError: Job (task=0) failed during processing with trace:

Traceback (most recent call last): File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/submission.py", line 53, in process_job result = delayed.result() File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/utils.py", line 122, in result self._result = self.function(*self.args, **self.kwargs) File "/home/dina/CodeGen/codegen_sources/preprocessing/bpe_modes/fast_bpe_mode.py", line 53, in learn_bpe_file assert ( AssertionError: failed to learn bpe on /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb, command: /home/dina/CodeGen/codegen_sources/model/tools/fastBPE/fast learnbpe 50000 /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb > /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.codes


You can check full logs with 'job.stderr(0)' and 'job.stdout(0)'or at paths:

baptisteroziere commented 2 years ago

Hum that never happened to me before. Did you run /home/dina/CodeGen/codegen_sources/model/tools/fastBPE/fast learnbpe 50000 /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb > /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.codes to see more detailed logs ?

dinaalaaahmed commented 2 years ago

Thank you for your response. The error occurred when I was using Linux as a virtual machine. Solved when we have used Linux as an operating system.

voiteshonok commented 1 year ago

I bumped in exactly same problem, when was trying to run pipeline in GoogleColab, I guess the original folder structure was changed, and you can notice that in _installenv.sh fastBPE is installed in _codegensources/model/tools, so I just simply made

!cp -r codegen_sources/model/tools/fastBPE/ ./fastBPE

and it helped