SUSYUSTC / MathTranslate

translate scientific papers in latex, especially arxiv papers
https://github.com/SUSYUSTC/MathTranslate
Apache License 2.0
1.03k stars 69 forks source link

报错RecursionError: maximum recursion depth exceeded while calling a Python object #62

Closed Ethan-Chen-plus closed 10 months ago

Ethan-Chen-plus commented 10 months ago
translate_arxiv.exe 2305.13048 -eng tencent
The current mathtranslate is latest
Start
engine tencent
language from en
language to zh
threads 1

arxiv number: 2305.13048

temporary directory C:\Users\25122\AppData\Local\Temp\tmprmub0lge
main tex files found:
.\main.tex
merging .\acc_figures.tex
Processing .\main
Cache is found
It is not a full latex document
  0%|                                                                                          | 0/297 [00:00<?, ?it/s]Error found in Parapragh 110
Content
Tasks:
¥begin  {itemize}
    ¥item LAMBADA‾¥cite  {LAMBADAdataset}  . A benchmark dataset that evaluates the model's contextual reasoning and language comprehension abilities by presenting context-target pairs, where the objective is to predict the most probable target token.
    ¥item PIQA‾¥cite  {Bisk2020}  . A benchmark for the task of physical common sense reasoning, which consists of a binary choice task that can be better understood as a set of two pairs, namely (Goal, Solution).
    ¥item HellaSwag ‾¥cite  {HellaSwag2019}   A novel benchmark for commonsense Natural Language Inference (NLI) which is build by adversarial filtering against transformer models.
    ¥item Winogrande ‾¥cite  {Wino2020}   A dataset designed to evaluate the acquisition of common sense reasoning by neural language models, aiming to determine whether we are accurately assessing the true capabilities of machine common sense.
    ¥item StoryCloze‾¥cite  {StoryCloze2016}   A benchmark to present a novel approach to assess comprehension of narratives, narrative generation, and script acquisition, focusing on commonsense reasoning.
    ¥item ARC Challenge ‾¥cite  {ARC2018}   A dataset designed for multiple-choice question answering, encompassing science exam questions ranging from third grade to ninth grade.
    ¥item ARC Easy An easy subset of ARC.
    ¥item HeadQA ‾¥cite  {HeadQA2020}   A benchmark consisting of graduate-level questions encompassing various fields such as medicine, nursing, biology, chemistry, psychology, and pharmacology.
    ¥item OpenBookQA ‾¥cite  {OpenBookQA2018}   A QA dataset to evaluate human comprehension of a subject by incorporating open book facts, scientific knowledge, and perceptual common sense, drawing inspiration from open book exams.
    ¥item SciQ ‾¥cite  {SciQ2017}   A multiple-choice QA dataset which was created using an innovative approach to gather well-crafted multiple-choice questions that are focused on a specific domain.
    ¥item TriviaQA ‾¥cite  {TriviaQA2017}   A QA-IR dataset which is constituted of triples of questions, answers, supporting evidence, and independently collected evidence documents, with an average of six documents per question for reliable sources.
    ¥item ReCoRD ‾¥cite  {ReCord}   A benchmark for evaluating commonsense reasoning in reading comprehension by generating queries from CNN/Daily Mail news articles and requiring text span answers from corresponding summarizing passages.
    ¥item COPA ‾¥cite  {COPA2011}   A dataset to evaluate achievement in open-domain commonsense causal reasoning.
    ¥item MMMLU ‾¥cite  {MMMLU2021}   A multi-task dataset for 57 tasks containing elementary mathematics, US history, computer science, law, etc.
¥end  {itemize}
 37%|████████████████████████████▉                                                 | 110/297 [00:00<00:00, 1188.54it/s]
Error found in Parapragh 110
Content
¥begin  {table*}  [!]
¥centering
¥small
¥begin  {tabular}    {llllllllll}
        ¥toprule
¥textbf  {Model}   & ¥textbf  {Params}   & ¥textbf  {PIQA}   & ¥textbf  {StoryCloze}   & ¥textbf  {HellaSwag}   & ¥textbf  {WinoGrande}   & ¥textbf  {ARC-e}   & ¥textbf  {ARC-c}   & ¥textbf  {OBQA}   ¥¥
 & B & acc & acc & acc¥\_norm & acc & acc & acc¥\_norm & acc¥\_norm ¥¥
¥midrule
RWKV-4 &  0.17 & ¥TBF  {65.07}   & ¥TBF  {58.79}   & ¥TBF  {32.26}   & 50.83 & ¥TBF  {47.47}   & ¥TBF  {24.15}   & ¥TBF  {29.60}   ¥¥
Pythia & 0.16      & 62.68 & 58.47 & 31.63 & ¥TBF  {52.01}   & 45.12 & 23.81 & 29.20 ¥¥
GPT-Neo & 0.16     & 63.06 & 58.26 & 30.42 & 50.43 & 43.73 & 23.12 & 26.20 ¥¥
¥midrule
RWKV-4 &  0.43 & ¥TBF  {67.52}   & ¥TBF  {63.87}   & ¥TBF  {40.90}   & 51.14       & ¥TBF  {52.86}   & 25.17       & ¥TBF  {32.40}   ¥¥
Pythia & 0.40      & 66.70       & 62.64      & 39.10      & ¥TBF  {53.35}   & 50.38       & ¥TBF  {25.77}   & 30.00 ¥¥
GPT-Neo & 0.40     & 65.07       & 61.04       & 37.64      & 51.14     & 48.91       & 25.34 & 30.60 ¥¥
¥midrule
RWKV-4 & 1.5 & ¥TBF  {72.36}   & ¥TBF  {68.73}   & ¥TBF  {52.48}     & 54.62      & ¥TBF  {60.48}   & ¥TBF  {29.44}               & ¥TBF  {34.00}   ¥¥
Pythia & 1.4 & 71.11       & 67.66       & 50.82        & ¥TBF  {56.51}   & 57.74 & 28.58             & 30.80 ¥¥
GPT-Neo & 1.4 & 71.16      & 67.72        & 48.94       & 54.93       & 56.19       & 25.85             & 33.60 ¥¥
¥midrule
RWKV-4 & 3.0 & ¥TBF  {74.16}   & ¥TBF  {70.71}   & ¥TBF  {59.89}   & 59.59       & ¥TBF  {65.19}           & ¥TBF  {33.11}   & ¥TBF  {37.00}   ¥¥
Pythia & 2.8 & 73.83       & ¥TBF  {70.71}   & 59.46       & ¥TBF  {61.25}   & 62.84   & 32.25 & 35.20 ¥¥
GPT-Neo & 2.8 & 72.14      & 69.54       & 55.82        & 57.62      & 61.07         & 30.20       & 33.20 ¥¥
¥midrule
RWKV-4 & 7.4 & ¥TBF  {76.06}   & 73.44       & 65.51      & 61.01       & ¥TBF  {67.80}   & ¥TBF  {37.46}           & ¥TBF  {40.20}   ¥¥
Pythia & 6.9 & 74.54       & 72.96       & 63.92      & 61.01       & 66.79       & 35.07               & 38.00 ¥¥
GPT-J & 6.1  & 75.41        & ¥TBF  {74.02 }  & ¥TBF  {66.25}  & ¥TBF  {64.09}   & 66.92 & 36.60                    & 38.20 ¥¥
¥midrule
RWKV-4 & 14.2    & ¥TBF  {77.48}              & ¥TBF  {76.06}     & ¥TBF  {70.65}         & 63.85       & 70.24       & ¥TBF  {38.99}         & ¥TBF  {41.80}   ¥¥
GPT-level  $^*$   & 14.2 & 76.49           & 74.97   & 68.72       & ¥TBF  {65.14}        & ¥TBF  {70.77}          & 37.99       & 39.27 ¥¥
¥midrule
Pythia (c.f.)  & 11.8    & 75.90           & 74.40   & 67.38       & 64.72       & 69.82        & 36.77       & 38.80 ¥¥
GPT-NeoX (c.f.)  & 20.6  & 77.69     & 76.11 & 71.42 & 65.98    & 72.69 & 40.44 & 40.20 ¥¥
¥bottomrule
¥end  {tabular}
¥centering
¥caption  {¥label{tab:commonsense_reasoning_results}
Zero-Shot Performance of the model on Common Sense Reasoning Tasks.   $^*$   Interpolation of Pythia and GPT-Neo models
}
¥end  {table*}
Traceback (most recent call last):
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\translate.py", line 246, in translate_full_latex
    latex_translated_paragraphs = list(tqdm.auto.tqdm(executor.map(self.worker, latex_original_paragraphs), total=len(latex_original_paragraphs)))
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\std.py", line 1180, in __iter__
    for obj in iterable:
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\_base.py", line 435, in result
    return self.__get_result()
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\translate.py", line 201, in worker
    raise e
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\translate.py", line 191, in worker
    latex_translated_paragraph = self.translate_paragraph_latex(latex_original_paragraph)
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\translate.py", line 170, in translate_paragraph_latex
    latex_translated_paragraph = self.translate_text_in_paragraph_latex_and_leading_brace(latex_original_paragraph)
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\translate.py", line 165, in translate_text_in_paragraph_latex_and_leading_brace
    latex_translated_paragraph = self.translate_text_in_paragraph_latex(latex_original_paragraph)
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\translate.py", line 142, in translate_text_in_paragraph_latex
    result += self._translate_text_in_paragraph_latex(split) + ' ' + sep + ' '
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\translate.py", line 117, in _translate_text_in_paragraph_latex
    text_original_paragraph = process_text.split_too_long_paragraphs(text_original_paragraph)
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\process_text.py", line 47, in split_too_long_paragraphs
    par2 = split_too_long_paragraphs('.'.join(lines[position:]))
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\process_text.py", line 47, in split_too_long_paragraphs
    par2 = split_too_long_paragraphs('.'.join(lines[position:]))
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\process_text.py", line 47, in split_too_long_paragraphs
    par2 = split_too_long_paragraphs('.'.join(lines[position:]))
  [Previous line repeated 982 more times]
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\process_text.py", line 42, in split_too_long_paragraphs
    first_words = [get_first_word(line) for line in lines]
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\process_text.py", line 42, in <listcomp>
    first_words = [get_first_word(line) for line in lines]
  File "C:\Users\25122\AppData\Local\Programs\Python\Python37\lib\site-packages\mathtranslate\process_text.py", line 26, in get_first_word
    words = line.split(' ')
RecursionError: maximum recursion depth exceeded while calling a Python object
SUSYUSTC commented 10 months ago

这个应该是腾讯翻译导致的问题,目前我们已经发布了网页版可以在线翻译。

Ethan-Chen-plus commented 10 months ago

谢谢!使用网页版翻译得到的结果是正常的。