dmMaze / BallonsTranslator

深度学习辅助漫画翻译工具, 支持一键机翻和简单的图像/文本编辑 | Yet another computer-aided comic/manga translation tool powered by deeplearning
GNU General Public License v3.0
2.41k stars 162 forks source link

Error "number of translations does not match to source" when using ChatGPT translation module #436

Closed Gandi61 closed 4 months ago

Gandi61 commented 4 months ago

Hello! I've encountered an issue where when attempting to translate certain pages from Japanese to English using ChatGPT, the translation result no longer matches the original source (for example, translating text from "source 2" ends up in "translation 3," and one of the "translation" fields remains empty; I've attached an image to clarify). gpt translation

If I translate all pages at once using the "RUN" button, I receive a notification in the logs saying "number of translations does not match to source" when the translation queue reaches the problematic page.

However, if I translate the specific "problematic" page using the "Translate page" function, this notification does not appear. Nonetheless, the translation result in the "Translation" field still does not match the original text in the "Source" field. So, the identical problem persists despite the absence of a notification in the logs.

In the page where the error occurs, there are a total of 19 'text boxes' and approximately 739 characters

If I choose to translate only 17 "text windows" instead of all of them, the problem does not occur (the text in the "Source" fields matches the text in the "Translation" fields). Additionally, the issue does not occur if the text is shortened by approximately 150-250 characters. I tried various options with prompt template and chat system template, but couldn't fix the issue

I also tried various options with prompt template and chat system template, but couldn't fix the issue.

UPD Tried to reproduce the issue on "Manga-image-translator", but there the original text matches the translation completely. Even looking at the code of the ChatGPT module in "Manga-image-translator", it can be seen that there is no notification for "number of translations does not match the source", unlike the code in the ChatGPT module in "BalloonsTranslator". It seems that "BalloonsTranslator" has some peculiarity in the distribution of responses received from the ChatGPT API.

I still hope the issue can be fixed with the right prompt, but success hasn't been achieved yet. It's very strange that other users haven't encountered similar problems (at least I haven't found such information).

I noticed that the console of "Manga-image-translator" displays more detailed information in the logs. Is it possible to get more detailed logs for "BalloonsTranslator"?

UPD 2

I managed to clarify the essence of the problem a bit more. I found this closed issue - https://github.com/dmMaze/BallonsTranslator/issues/379 and went to check the module version again. It turned out that the version was new, but the problem I described earlier still occurred. But it's not that simple.

I made a significant mistake on my part. I was translating in RUN mode using the old version of the module. And after the update, I translated the problematic page using the "Translate page" function. 18-04-2024 18_06_53

In addition, I was using text block highlighting and the "translate" function. 18-04-2024 18_08_43

And now the most interesting part. It turned out that if you use translation in RUN mode, then with the new version of the Chatgpt module, everything is translated without problems, but if you use the "Translate page" and "translate" functions, the problem I originally described arises.

On the one hand, the issue can be closed, but on the other hand, the problem persists for the "Translate page" and "translate" functions.

dmMaze commented 4 months ago

And now the most interesting part. It turned out that if you use translation in RUN mode, then with the new version of the Chatgpt module, everything is translated without problems, but if you use the "Translate page" and "translate" functions, the problem I originally described arises.

Not sure why, the input of RUN and Translate page should be identical for the same page. The logger will print source text list and prompt of chatgpt since 26efe38

Gandi61 commented 4 months ago

Thank you for your response.

To save time on reading the entire text, you can jump straight to the last paragraph of my message, where I've summarized the information concisely.

Yesterday, I attempted to reproduce the error once again. Unfortunately, before attempting to reproduce it, I didn't check here, so the versions of the files "base.py," "logger.py," and "trans_chatgpt.py" were outdated. I launched the translation of the problematic page through the "Translate page" function. The translation process took approximately 4-5 times longer than usual. As a result, 7691 tokens were consumed instead of the usual 1600-1700 tokens for translating the problematic page.

The essence of the problem was that when returning "result number 12," ChatGPT returned the translation of "source number 12," but instead of writing the text of the obtained result once, ChatGPT duplicated this result multiple times. Thus, no other results were returned after "result number 12." According to the logs, the program detected the issue and attempted to translate it several times. The problem did not occur during the last attempt. As a result, I received a correct translation (in terms of matching the number of requests and results). In other words, the error I described earlier did not reproduce; it simply consumed more tokens than usual.

I have been unable to reproduce this problem further. Currently, the files "base.py," "logger.py," and "trans_chatgpt.py" have been updated. If I discover anything else, I will update here.

Below, I am attaching a part of the log from the cmd window, which was obtained during the problem described in this message.

Please help me to translate the following text from a manga to English
<|1|> source text in Japanese
<|2|> source text in Japanese
... #there was a correct listing of lines with the source
<|19|> source text in Japanese
translations:
... #Here is the result, which looks correct, until it reaches 'result number 12', after which 'result number 12' is repeated many times
openai response:
<|1|> Translation result from Japanese
<|2|> Translation result from Japanese
...
<|12|> The translation result from Japanese that was repeated many times
Restarting request. Attempt: 1
--- Logging error ---
Traceback (most recent call last):
  File "G:\Programs\BallonsTranslator_dev_src_with_gitpython 4\modules\translators\trans_chatgpt.py", line 235, in _translate
    raise InvalidNumTranslations
modules.translators.trans_chatgpt.InvalidNumTranslations
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "logging\__init__.py", line 1103, in emit
  File "encodings\cp1251.py", line 19, in encode
UnicodeEncodeError: 'charmap' codec can't encode characters in position 140-164: character maps to <undefined>
Call stack:
  File "G:\Programs\BallonsTranslator_dev_src_with_gitpython 4\ui\module_manager.py", line 464, in run
    self.job()
  File "G:\Programs\BallonsTranslator_dev_src_with_gitpython 4\ui\module_manager.py", line 309, in <lambda>
    self.job = lambda : self._blktrans_pipeline(blk_list, tgt_img, mode, blk_ids)
  File "G:\Programs\BallonsTranslator_dev_src_with_gitpython 4\ui\module_manager.py", line 317, in _blktrans_pipeline
    self.translate_thread.module.translate_textblk_lst(blk_list)
  File "G:\Programs\BallonsTranslator_dev_src_with_gitpython 4\modules\translators\base.py", line 202, in translate_textblk_lst
    _translations = self.translate(text_list)
  File "G:\Programs\BallonsTranslator_dev_src_with_gitpython 4\modules\translators\base.py", line 149, in translate
    text_trans = self._translate(text_source)
  File "G:\Programs\BallonsTranslator_dev_src_with_gitpython 4\modules\translators\trans_chatgpt.py", line 244, in _translate
    self.logger.warn(message + '\n' + f'Restarting request. Attempt: {retry_attempt}')
  File "logging\__init__.py", line 1494, in warn
  File "logging\__init__.py", line 1489, in warning
  File "logging\__init__.py", line 1624, in _log
  File "logging\__init__.py", line 1634, in handle
  File "logging\__init__.py", line 1696, in callHandlers
  File "logging\__init__.py", line 968, in handle
  File "logging\__init__.py", line 1218, in emit
  File "logging\__init__.py", line 1108, in emit
Message: 'number of translations does not match to source:\nprompt:\n    Please help me to translate the following text from a manga to English\n<|1|>
... #After this, there is again a situation of the translation result being repeated many times. This situation, in turn, also repeats several times

Restarting request. Attempt: 2'
Arguments: ()
[INFO   ] _client:_send_single_request:1026 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[INFO   ] trans_chatgpt:_translate:262 - Used 1629 tokens (Total: 7691)

The brief summary of my message is: The issue clearly arises on the side of the GPT chat API, it has a sporadic nature and may occur when attempting to translate pages with a large amount of text. Over the past 2 days, several dozen attempts have been made to reproduce the problem, but without success. Therefore, I am closing the issue.