facebookresearch / TransCoder

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf
Other
1.69k stars 258 forks source link

ValueError: invalid literal for int() with base 10: '' #54

Open zjj1999 opened 2 years ago

zjj1999 commented 2 years ago

when I using preprocess pipeline, it shows:

========================================================================= FAILURES =========================================================================
_____________________________________________________ test_run_pipeline_locally_3_langs_with_comments ______________________________________________________
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/zhangjiajie/miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 87, in process
    nlines, size_gb = job.result()
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 263, in result
    self._result = self.func(*self.args, **self.kwargs)
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 52, in split_train_test_valid
    n_lines = get_nlines(all_tok)
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 119, in get_nlines
    return int(process.stdout.decode().split(' ')[0])
ValueError: invalid literal for int() with base 10: ''
"""

The above exception was the direct cause of the following exception:

    def test_run_pipeline_locally_3_langs_with_comments():
        copy_and_clean_folder()
>       preprocess(root, lang1, lang2, keep_comments, local=True,
                   lang3=lang3, test_size=10, size_gb=0)

preprocessing/test_preprocess.py:65: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
preprocessing/preprocess.py:64: in preprocess
    dataset.process_languages(
preprocessing/src/dataset.py:166: in process_languages
    print(type(jobs[i].result()))
../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:446: in result
    return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = None

    def __get_result(self):
        if self._exception:
            try:
>               raise self._exception
E               ValueError: invalid literal for int() with base 10: ''

../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:391: ValueError
------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------
/Users/zhangjiajie/Documents/code/TransCoder-main/data/test_dataset/cpp-java-python.with_comments.XLM-syml
java: process ...
java: tokenizing 2 json files ...
cpp: process ...
cpp: tokenizing 2 json files ...
python: process ...
python: tokenizing 2 json files ...
------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------
100%|██████████| 50/50 [00:02<00:00, 24.21it/s]
100%|██████████| 50/50 [00:03<00:00, 16.08it/s]]
100%|██████████| 100/100 [00:03<00:00, 26.65it/s]
100%|██████████| 50/50 [00:02<00:00, 22.24it/s]
100%|██████████| 50/50 [00:02<00:00, 24.27it/s]]
100%|██████████| 150/150 [00:02<00:00, 72.05it/s] 

Can someone help me

zjj1999 commented 2 years ago

so The above error was made locally on a MAC environment. I tried to configure the project environment on an Ubuntu server, and no error was reported this time.