WittmannF / jupyter-translate

Library for translating jupyter notebook (.ipynb) files
https://pypi.org/project/jupyter-translate/
MIT License
24 stars 17 forks source link

TypeError: expected string or bytes-like object #12

Open martincousi opened 3 weeks ago

martincousi commented 3 weeks ago

When translating a notebook to french, jtranslate reports the following error:

Traceback (most recent call last):
  File "\\?\C:\Users\11143054\AppData\Local\miniconda3\envs\jtranslate\Scripts\jupyter_translate-script.py", line 33, in <module>
    sys.exit(load_entry_point('jupyter-translate', 'console_scripts', 'jupyter_translate')())
  File "c:\users\11143054\documents\github\jupyter-translate\jupyter_translate.py", line 209, in main
    jupyter_translate(
  File "c:\users\11143054\documents\github\jupyter-translate\jupyter_translate.py", line 172, in jupyter_translate
    translate_markdown(source, translator, delay=delay)
  File "c:\users\11143054\documents\github\jupyter-translate\jupyter_translate.py", line 96, in translate_markdown
    return translate(text) + '\n'
  File "c:\users\11143054\documents\github\jupyter-translate\jupyter_translate.py", line 86, in translate
    text = replace_from_list('[Xx]' + LINK_REPLACEMENT_KW[1:], text, md_links)
  File "c:\users\11143054\documents\github\jupyter-translate\jupyter_translate.py", line 66, in replace_from_list
    return re.sub(tag, lambda x: next(iter(replacement_gen)), text)
  File "C:\Users\11143054\AppData\Local\miniconda3\envs\jtranslate\lib\re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

Removing the Markdown lines --- in my notebook allowed jtranslate to progress further. However, it stopped again when arriving at a Markdown table:

ClientID Date Demand (units)
X15 06-12-2020 560
AO5 06-12-2020 1152
ZI5 08-12-2020 32
T65 10-12-2020 194

Once that table was removed, I was able to fully translation the notebook. Is there an option to skip --- lines and Markdown tables?

WittmannF commented 3 weeks ago

Hi @martincousi , thanks for the heads up, apparently a recent update made the library incompatible with windows OS. Let me try to replicate on my side and confirm if that's the issue. In the meantime, can you try running it on a Unix based OS? For example, you can run on Google colab: https://colab.research.google.com/drive/1QL7-L4AjL0kZ4nC51K2BmE_9_pNHsdtu?usp=sharing Update: sorry, saw the full message only now. Can you share an example of notebook that would raise this error?

martincousi commented 3 weeks ago

This is a notebook with such table and lines: https://github.com/acedesci/scanalytics/blob/master/EN/S01_Intro/01_InClass_Exercises.ipynb

I also saw that jtranslate translates the \left( and \right) LaTeX commands which is problematic.

WittmannF commented 3 weeks ago

Thanks @martincousi! I was able to replicate it here. In the meantime, can you please run the legacy version at https://github.com/WittmannF/jupyter-translate/tree/master/legacy ? I was able to run the file from there with no issues.

@andrebelem, can you please take a look? I'm trying to understand which specific recent change is raising this error.

andrebelem commented 3 weeks ago

The code is probably confusing the sequence of strings that mark the table. Tomorrow I will study how the code reacts in different situations to correct it. In the meantime, I suggest using legacy (just point python to the legacy code).

andrebelem commented 3 weeks ago

First assessement: The code was designed to search for a specific pattern in a text and replace it with something else. However, it ran into a problem because of two special cases:

What Was Done to Fix It?

Update: I wanted to update you that I am currently conducting tests, but I’ve encountered significant degradation with Google services today. Due to these issues, I will need more time to complete the testing.

I appreciate your understanding and will keep you posted on any further developments.

andrebelem commented 3 weeks ago

⚠️ Warning: Proper Handling of Embedded Code When including embedded code in your markdown files, please ensure that the code is enclosed within triple backticks (```). This is essential to prevent the program from mistakenly translating the code or misinterpreting it as regular text.

Example of Proper Embedded Code:

def example_function():
      """This is a docstring."""
      return "Hello, World!"

Why This is Important: If embedded code is not properly enclosed within triple backticks, the program may inadvertently translate the code, altering its functionality or meaning. This can lead to unexpected results, especially when the embedded code is meant for demonstration purposes rather than execution.