UlionTse / translators

🌏🌍🌎Translators🌎🌍🌏 is a library that aims to bring free, multiple, enjoyable translations to individuals and students in Python. Translators是一个旨在用Python为个人和学生带来免费、多样、愉快翻译的库。
https://pypi.org/project/translators/
GNU General Public License v3.0
1.63k stars 189 forks source link

Bulk translation? #67

Closed dumitrescustefan closed 2 years ago

dumitrescustefan commented 2 years ago

Hi!

Is there an option for bulk translation (i.e. 100 sentences at a time)?

I tried to fake it with a custom html page with a table and one sentence per line, but the example itself is not working:

html_text = '''
<!DOCTYPE html>
<html>
<head>
    <title>这是标题</title>
</head>
<body>
<p>这是文章《你的父亲》</p>
</body>
</html>
'''

import translators as ts
print(ts.translate_html(html_text, translator=ts.google, to_language='en', n_jobs=1))

fails with :

Using Spain server backend.
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/echo/p3.8/lib/python3.8/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/home/echo/p3.8/lib/python3.8/site-packages/translators/apis.py", line 2065, in <lambda>
    _map_translate_func = lambda sentence: (sentence,translator(query_text=sentence, to_language=to_language, **kwargs))
  File "/home/echo/p3.8/lib/python3.8/site-packages/translators/apis.py", line 415, in google_api
    data = json.loads(json_data[0][2])
  File "/usr/lib/python3.8/json/__init__.py", line 341, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/echo/work/bianca/translate.py", line 14, in <module>
    print(ts.translate_html(html_text, translator=ts.google, to_language='en', n_jobs=1))
  File "/home/echo/p3.8/lib/python3.8/site-packages/translators/apis.py", line 2066, in translate_html
    result_list = pathos.multiprocessing.ProcessPool(n_jobs).map(_map_translate_func, sentence_list)
  File "/home/echo/p3.8/lib/python3.8/site-packages/pathos/multiprocessing.py", line 139, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
    raise self._value
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
Exception ignored in: <function Pool.__del__ at 0x7fa5098204c0>
Traceback (most recent call last):
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/pool.py", line 268, in __del__
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/queues.py", line 365, in put
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/reduction.py", line 54, in dumps
  File "/home/echo/p3.8/lib/python3.8/site-packages/multiprocess/reduction.py", line 42, in __init__
  File "/home/echo/p3.8/lib/python3.8/site-packages/dill/_dill.py", line 573, in __init__
ImportError: sys.meta_path is None, Python is likely shutting down

Process finished with exit code 1
dumitrescustefan commented 2 years ago

Reading already closed issues I see that you suggest using a separator that could be recovered from the translation. That could work. Still, I'm getting errors with sentences > 1000 lines for sogou for example, and google randomly seems to fail (one sentence per time).

So the question is: could a html table be used for "bulk" translation better than a single, long (e.g. 1000 chars) custom delimited string?

Thanks!

UlionTse commented 2 years ago

@dumitrescustefan ImportError: sys.meta_path is None, Python is likely shutting down. Regarding this, I've fixed it, the bug was caused by the multiprocessing not being closed explicitly. Regarding the bulk translation towards html you said(translate_html()), it is multi-process. For each translator, not only one sentence can be translated, but generally 5000 words can be translated. There is also your crazy idea, please be kind to the free service, and don't request it in large quantities quickly, because you will be blocked from IP and can no longer request translation services.

UlionTse commented 2 years ago

@dumitrescustefan pip install translators --upgrade, If there are still problems or errors, can you provide more information? Such as python version, runtime environment, etc. Thx.

UlionTse commented 2 years ago

@dumitrescustefan Maybe you are talking about this problem #15 , the only way to solve it is to multi-process. But by the way, pay attention to the frequency of requests.

dumitrescustefan commented 2 years ago

Thank you very much for the response, I will look into it. I did install the latest version of translators, the environment is an Ubuntu 20.04 with python 3.8. I will close the issue and will reopen it if needed. Again, thanks for your quick response!