UlionTse / translators

🌏🌍🌎Translators🌎🌍🌏 is a library that aims to bring free, multiple, enjoyable translations to individuals and students in Python. Translators是一个旨在用Python为个人和学生带来免费、多样、愉快翻译的库。
https://pypi.org/project/translators/
GNU General Public License v3.0
1.62k stars 189 forks source link

[Bug]: Unable to translate in Santali language #132

Closed Prasanta-Hembram closed 1 year ago

Prasanta-Hembram commented 1 year ago

Debug Tips

What happened?

translators is unable to translate in Santali language using mymemory. Whereas it supports Santali language, which can be accessed using this api: https://api.mymemory.translated.net/get?q=What-is-your-name&langpair=en|sat

I have given the following input and expected to get Santali translation, but got an error (logged below):

import translators as ts
q_text = 'what is your name'
ds = 'myMemory'
from_language = 'en-GB'
to_language = 'sat-IN'
print(ts.translate_text(q_text, ds, from_language, to_language))

APP Version

0.0.1(example)

Python Version

3.10

Runtime Environment

Windows 10

Country/Region

India

Relevant log output

---------------------------------------------------------------------------

TranslatorError                           Traceback (most recent call last)

<ipython-input-12-f0fdabf0d728> in <cell line: 6>()
      4 from_language = 'en-GB'
      5 to_language = 'sat-IN'
----> 6 print(ts.translate_text(q_text, ds, from_language, to_language))

4 frames

/usr/local/lib/python3.10/dist-packages/translators/server.py in check_language(self, from_language, to_language, language_map, output_auto, output_zh, output_en_translator, output_en, if_check_lang_reverse)
    180             raise TranslatorError('Unsupported from_language[{}] in {}.'.format(from_language, sorted(language_map.keys())))
    181         elif to_language not in language_map and if_check_lang_reverse:
--> 182             raise TranslatorError('Unsupported to_language[{}] in {}.'.format(to_language, sorted(language_map.keys())))
    183         elif from_language != output_auto and to_language not in language_map[from_language]:
    184             raise TranslatorError('Unsupported translation: from [{0}] to [{1}]!'.format(from_language, to_language))

TranslatorError: Unsupported to_language[sat-IN] in ['acf-LC', 'af-ZA', 'aig-AG', 'am-ET', 'ar-SA', 'az-AZ', 'bah-BS', 'be-BY', 'bem-ZM', 'bg-BG', 'bi-VU', 'bjs-BB', 'bn-IN', 'bo-CN', 'br-FR', 'bs-BA', 'ca-ES', 'ceb-PH', 'ch-GU', 'ckb-IQ', 'cop-EG', 'crs-SC', 'cs-CZ', 'cy-GB', 'da-DK', 'de-CH', 'de-DE', 'dv-MV', 'dz-BT', 'el-GR', 'en-GB', 'eo-EU', 'es-ES', 'et-EE', 'eu-ES', 'fa-IR', 'fi-FI', 'fn-FNG', 'fo-FO', 'fr-FR', 'ga-IE', 'gcl-GD', 'gd-GB', 'gl-ES', 'grc-GR', 'gu-IN', 'gv-IM', 'gyn-GY', 'ha-NE', 'haw-US', 'he-IL', 'hi-IN', 'hr-HR', 'ht-HT', 'hu-HU', 'hy-AM', 'id-ID', 'is-IS', 'it-IT', 'ja-JP', 'jam-JM', 'jv-ID', 'ka-GE', 'kab-DZ', 'kea-CV', 'kk-KZ', 'kl-GL', 'km-KM', 'kn-IN', 'ko-KR', 'ku-TR', 'ky-KG', 'la-VA', 'lb-LU', 'lo-LA', 'lt-LT', 'lv-LV', 'men-SL', 'mfe-MU', 'mg-MG', 'mh-MH', 'mi-NZ', 'mk-MK', 'mn-MN', 'ms-MY', 'mt-MT', 'my-MM', 'ne-NP', 'niu-NU', 'nl-NL', 'no-NO', 'ny-MW', 'pa-IN', 'pap-CW', 'pau-PW', 'pis-SB', 'pl-PL', 'pot-US', 'pov-GW', 'ppk-ID', 'ps-PK', 'pt-PT', 'qu-PE', 'rm-RO', 'rn-BI', 'ro-RO', 'ru-RU', 'rw-RW', 'sg-CF', 'si-LK', 'sk-SK', 'sl-SI', 'sm-WS', 'sn-ZW', 'so-SO', 'sq-AL', 'sr-RS', 'srn-SR', 'st-ST', 'sv-SE', 'svc-VC', 'sw-SZ', 'syc-TR', 'ta-LK', 'te-IN', 'tet-TL', 'tg-TJ', 'th-TH', 'ti-TI', 'tk-TM', 'tkl-TK', 'tl-PH', 'tmh-DZ', 'tn-BW', 'to-TO', 'tpi-PG', 'tr-TR', 'tvl-TV', 'uk-UA', 'ur-PK', 'uz-UZ', 'vi-VN', 'vic-US', 'wls-WF', 'wo-SN', 'xh-ZA', 'yi-YD', 'zdj-KM', 'zh-CN', 'zh-TW', 'zu-ZA'].

Screenshots

ᱪᱤᱛᱟᱹᱨ

Code of Conduct

Prasanta-Hembram commented 1 year ago

The updated list of languages supported by Mymemory is as follows:

MY_MEMORY_LANGUAGES = { "acehnese": "ace-ID", "afrikaans": "af-ZA", "akan": "ak-GH", "albanian": "sq-AL", "amharic": "am-ET", "antigua and barbuda creole english": "aig-AG", "arabic": "ar-SA", "arabic egyptian": "ar-EG", "aragonese": "an-ES", "armenian": "hy-AM", "assamese": "as-IN", "asturian": "ast-ES", "austrian german": "de-AT", "awadhi": "awa-IN", "ayacucho quechua": "quy-PE", "azerbaijani": "az-AZ", "bahamas creole english": "bah-BS", "bajan": "bjs-BB", "balinese": "ban-ID", "balkan gipsy": "rm-RO", "bambara": "bm-ML", "banjar": "bjn-ID", "bashkir": "ba-RU", "basque": "eu-ES", "belarusian": "be-BY", "belgian french": "fr-BE", "bemba": "bem-ZM", "bengali": "bn-IN", "bhojpuri": "bho-IN", "bihari": "bh-IN", "bislama": "bi-VU", "borana": "gax-KE", "bosnian": "bs-BA", "bosnian (cyrillic)": "bs-Cyrl-BA", "breton": "br-FR", "buginese": "bug-ID", "bulgarian": "bg-BG", "burmese": "my-MM", "catalan": "ca-ES", "catalan valencian": "cav-ES", "cebuano": "ceb-PH", "central atlas tamazight": "tzm-MA", "central aymara": "ayr-BO", "central kanuri (latin script)": "knc-NG", "chadian arabic": "shu-TD", "chamorro": "ch-GU", "cherokee": "chr-US", "chhattisgarhi": "hne-IN", "chinese simplified": "zh-CN", "chinese trad. (hong kong)": "zh-HK", "chinese traditional": "zh-TW", "chinese traditional macau": "zh-MO", "chittagonian": "ctg-BD", "chokwe": "cjk-AO", "classical greek": "grc-GR", "comorian ngazidja": "zdj-KM", "coptic": "cop-EG", "crimean tatar": "crh-RU", "crioulo upper guinea": "pov-GW", "croatian": "hr-HR", "czech": "cs-CZ", "danish": "da-DK", "dari": "prs-AF", "dimli": "diq-TR", "dutch": "nl-NL", "dyula": "dyu-CI", "dzongkha": "dz-BT", "eastern yiddish": "ydd-US", "emakhuwa": "vmw-MZ", "english": "en-GB", "english australia": "en-AU", "english canada": "en-CA", "english india": "en-IN", "english ireland": "en-IE", "english new zealand": "en-NZ", "english singapore": "en-SG", "english south africa": "en-ZA", "english us": "en-US", "esperanto": "eo-EU", "estonian": "et-EE", "ewe": "ee-GH", "fanagalo": "fn-FNG", "faroese": "fo-FO", "fijian": "fj-FJ", "filipino": "fil-PH", "finnish": "fi-FI", "flemish": "nl-BE", "fon": "fon-BJ", "french": "fr-FR", "french canada": "fr-CA", "french swiss": "fr-CH", "friulian": "fur-IT", "fula": "ff-FUL", "galician": "gl-ES", "gamargu": "mfi-NG", "garo": "grt-IN", "georgian": "ka-GE", "german": "de-DE", "gilbertese": "gil-KI", "glavda": "glw-NG", "greek": "el-GR", "grenadian creole english": "gcl-GD", "guarani": "gn-PY", "gujarati": "gu-IN", "guyanese creole english": "gyn-GY", "haitian creole french": "ht-HT", "halh mongolian": "khk-MN", "hausa": "ha-NE", "hawaiian": "haw-US", "hebrew": "he-IL", "higi": "hig-NG", "hiligaynon": "hil-PH", "hill mari": "mrj-RU", "hindi": "hi-IN", "hmong": "hmn-CN", "hungarian": "hu-HU", "icelandic": "is-IS", "igbo ibo": "ibo-NG", "igbo ig": "ig-NG", "ilocano": "ilo-PH", "indonesian": "id-ID", "inuktitut greenlandic": "kl-GL", "irish gaelic": "ga-IE", "italian": "it-IT", "italian swiss": "it-CH", "jamaican creole english": "jam-JM", "japanese": "ja-JP", "javanese": "jv-ID", "jingpho": "kac-MM", "k'iche'": "quc-GT", "kabiyè": "kbp-TG", "kabuverdianu": "kea-CV", "kabylian": "kab-DZ", "kalenjin": "kln-KE", "kamba": "kam-KE", "kannada": "kn-IN", "kanuri": "kr-KAU", "karen": "kar-MM", "kashmiri (devanagari script)": "ks-IN", "kashmiri (arabic script)": "kas-IN", "kazakh": "kk-KZ", "khasi": "kha-IN", "khmer": "km-KH", "kikuyu kik": "kik-KE", "kikuyu ki": "ki-KE", "kimbundu": "kmb-AO", "kinyarwanda": "rw-RW", "kirundi": "rn-BI", "kisii": "guz-KE", "kongo": "kg-CG", "konkani": "kok-IN", "korean": "ko-KR", "northern kurdish": "kmr-TR", "kurdish sorani": "ckb-IQ", "kyrgyz": "ky-KG", "lao": "lo-LA", "latgalian": "ltg-LV", "latin": "la-XN", "latvian": "lv-LV", "ligurian": "lij-IT", "limburgish": "li-NL", "lingala": "ln-LIN", "lithuanian": "lt-LT", "lombard": "lmo-IT", "luba-kasai": "lua-CD", "luganda": "lg-UG", "luhya": "luy-KE", "luo": "luo-KE", "luxembourgish": "lb-LU", "maa": "mas-KE", "macedonian": "mk-MK", "magahi": "mag-IN", "maithili": "mai-IN", "malagasy": "mg-MG", "malay": "ms-MY", "malayalam": "ml-IN", "maldivian": "dv-MV", "maltese": "mt-MT", "mandara": "mfi-CM", "manipuri": "mni-IN", "manx gaelic": "gv-IM", "maori": "mi-NZ", "marathi": "mr-IN", "margi": "mrt-NG", "mari": "mhr-RU", "marshallese": "mh-MH", "mende": "men-SL", "meru": "mer-KE", "mijikenda": "nyf-KE", "minangkabau": "min-ID", "mizo": "lus-IN", "mongolian": "mn-MN", "montenegrin": "sr-ME", "morisyen": "mfe-MU", "moroccan arabic": "ar-MA", "mossi": "mos-BF", "ndau": "ndc-MZ", "ndebele": "nr-ZA", "nepali": "ne-NP", "nigerian fulfulde": "fuv-NG", "niuean": "niu-NU", "north azerbaijani": "azj-AZ", "sesotho": "nso-ZA", "northern uzbek": "uzn-UZ", "norwegian bokmål": "nb-NO", "norwegian nynorsk": "nn-NO", "nuer": "nus-SS", "nyanja": "ny-MW", "occitan": "oc-FR", "occitan aran": "oc-ES", "odia": "or-IN", "oriya": "ory-IN", "urdu": "ur-PK", "palauan": "pau-PW", "pali": "pi-IN", "pangasinan": "pag-PH", "papiamentu": "pap-CW", "pashto": "ps-PK", "persian": "fa-IR", "pijin": "pis-SB", "plateau malagasy": "plt-MG", "polish": "pl-PL", "portuguese": "pt-PT", "portuguese brazil": "pt-BR", "potawatomi": "pot-US", "punjabi": "pa-IN", "punjabi (pakistan)": "pnb-PK", "quechua": "qu-PE", "rohingya": "rhg-MM", "rohingyalish": "rhl-MM", "romanian": "ro-RO", "romansh": "roh-CH", "rundi": "run-BI", "russian": "ru-RU", "saint lucian creole french": "acf-LC", "samoan": "sm-WS", "sango": "sg-CF", "sanskrit": "sa-IN", "santali": "sat-IN", "sardinian": "sc-IT", "scots gaelic": "gd-GB", "sena": "seh-ZW", "serbian cyrillic": "sr-Cyrl-RS", "serbian latin": "sr-Latn-RS", "seselwa creole french": "crs-SC", "setswana (south africa)": "tn-ZA", "shan": "shn-MM", "shona": "sn-ZW", "sicilian": "scn-IT", "silesian": "szl-PL", "sindhi snd": "snd-PK", "sindhi sd": "sd-PK", "sinhala": "si-LK", "slovak": "sk-SK", "slovenian": "sl-SI", "somali": "so-SO", "sotho southern": "st-LS", "south azerbaijani": "azb-AZ", "southern pashto": "pbt-PK", "southwestern dinka": "dik-SS", "spanish": "es-ES", "spanish argentina": "es-AR", "spanish colombia": "es-CO", "spanish latin america": "es-419", "spanish mexico": "es-MX", "spanish united states": "es-US", "sranan tongo": "srn-SR", "standard latvian": "lvs-LV", "standard malay": "zsm-MY", "sundanese": "su-ID", "swahili": "sw-KE", "swati": "ss-SZ", "swedish": "sv-SE", "swiss german": "de-CH", "syriac (aramaic)": "syc-TR", "tagalog": "tl-PH", "tahitian": "ty-PF", "tajik": "tg-TJ", "tamashek (tuareg)": "tmh-DZ", "tamasheq": "taq-ML", "tamil india": "ta-IN", "tamil sri lanka": "ta-LK", "taroko": "trv-TW", "tatar": "tt-RU", "telugu": "te-IN", "tetum": "tet-TL", "thai": "th-TH", "tibetan": "bo-CN", "tigrinya": "ti-ET", "tok pisin": "tpi-PG", "tokelauan": "tkl-TK", "tongan": "to-TO", "tosk albanian": "als-AL", "tsonga": "ts-ZA", "tswa": "tsc-MZ", "tswana": "tn-BW", "tumbuka": "tum-MW", "turkish": "tr-TR", "turkmen": "tk-TM", "tuvaluan": "tvl-TV", "twi": "tw-GH", "udmurt": "udm-RU", "ukrainian": "uk-UA", "uma": "ppk-ID", "umbundu": "umb-AO", "uyghur uig": "uig-CN", "uyghur ug": "ug-CN", "uzbek": "uz-UZ", "venetian": "vec-IT", "vietnamese": "vi-VN", "vincentian creole english": "svc-VC", "virgin islands creole english": "vic-US", "wallisian": "wls-WF", "waray (philippines)": "war-PH", "welsh": "cy-GB", "west central oromo": "gaz-ET", "western persian": "pes-IR", "wolof": "wo-SN", "xhosa": "xh-ZA", "yiddish": "yi-YD", "yoruba": "yo-NG", "zulu": "zu-ZA", }

UlionTse commented 1 year ago

@Prasanta-Hembram Bro, how to get this?

Prasanta-Hembram commented 1 year ago

@UlionTse got the the list of languages from here: https://www.matecat.com/api/docs#languages MateCat uses Mymemory. I tried using the translators in Collab, but I received an error message that the language I wanted to translate to was missing. I'm not sure where to add them...

UlionTse commented 1 year ago

@Prasanta-Hembram Thanks, bro. Now you can pip install translators==5.7.8 --upgrade.

Prasanta-Hembram commented 1 year ago

Thank you @UlionTse It's working now. ᱪᱤᱛᱟᱹᱨ

UlionTse commented 1 year ago

@Prasanta-Hembram You are welcome.