Open zjj1999 opened 2 years ago
when I using preprocess pipeline, it shows:
========================================================================= FAILURES ========================================================================= _____________________________________________________ test_run_pipeline_locally_3_langs_with_comments ______________________________________________________ concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/Users/zhangjiajie/miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 87, in process nlines, size_gb = job.result() File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 263, in result self._result = self.func(*self.args, **self.kwargs) File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 52, in split_train_test_valid n_lines = get_nlines(all_tok) File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 119, in get_nlines return int(process.stdout.decode().split(' ')[0]) ValueError: invalid literal for int() with base 10: '' """ The above exception was the direct cause of the following exception: def test_run_pipeline_locally_3_langs_with_comments(): copy_and_clean_folder() > preprocess(root, lang1, lang2, keep_comments, local=True, lang3=lang3, test_size=10, size_gb=0) preprocessing/test_preprocess.py:65: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ preprocessing/preprocess.py:64: in preprocess dataset.process_languages( preprocessing/src/dataset.py:166: in process_languages print(type(jobs[i].result())) ../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:446: in result return self.__get_result() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = None def __get_result(self): if self._exception: try: > raise self._exception E ValueError: invalid literal for int() with base 10: '' ../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:391: ValueError ------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------- /Users/zhangjiajie/Documents/code/TransCoder-main/data/test_dataset/cpp-java-python.with_comments.XLM-syml java: process ... java: tokenizing 2 json files ... cpp: process ... cpp: tokenizing 2 json files ... python: process ... python: tokenizing 2 json files ... ------------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------- 100%|██████████| 50/50 [00:02<00:00, 24.21it/s] 100%|██████████| 50/50 [00:03<00:00, 16.08it/s]] 100%|██████████| 100/100 [00:03<00:00, 26.65it/s] 100%|██████████| 50/50 [00:02<00:00, 22.24it/s] 100%|██████████| 50/50 [00:02<00:00, 24.27it/s]] 100%|██████████| 150/150 [00:02<00:00, 72.05it/s]
Can someone help me
so The above error was made locally on a MAC environment. I tried to configure the project environment on an Ubuntu server, and no error was reported this time.
when I using preprocess pipeline, it shows:
Can someone help me