Animenosekai / translate

A module grouping multiple translation APIs
GNU Affero General Public License v3.0
499 stars 59 forks source link

The `example` function in `YandexTranslate` without translation #101

Open AlexK-1 opened 2 months ago

AlexK-1 commented 2 months ago

To get examples of sentences with a word in English, I use the YandexTranslate class function example. This function has the parameters destination_language and source_language and when they are equal, the error ParameterValueError: Parameter source_language cannot be equal to the destination_language parameter appears. Although in the documentation Yandex API says that you can use a language pair with the same languages.

To get around this error, I have to translate the word I need, and then specify the translation back to the source language in the example function. This leads to incorrect results.

The Yandex API allows (if you believe the documentation) not to translate the word to search for suggestions, but the library returns an error.

Is it possible to search for example Yandex sentences through your library without translating the word?

The code I'm using now:

from translatepy.translators import YandexTranslate

translator = YandexTranslate()

word = "regale"

translated_word = translator.translate(word, "ru", "en")  # forced translation of a word from English
examples = translator.example(translated_word.result, "en", "ru")  # none of the sentences contain the word 'regale'

print(examples.result)

The code that I would like to use, but which returns an error:

from translatepy.translators import YandexTranslate

translator = YandexTranslate()

word = "regale"

examples = translator.example(word, "en", "en")  # ParameterValueError: Parameter source_language cannot be equal to the destination_language parameter

print(examples.result)
ZhymabekRoman commented 2 months ago

Don't mix different things - dictionary (словарь) и example (примеры). That documentation talks about a dictionary. But okay, that's not the problem, we use custom reverse engineered API endpoints, the endpoint in the documentation requires API keys for authorisation.

ZhymabekRoman commented 2 months ago

Can you get the same behavior as you need (source language to source language) in Yandex Translate web application or Android application? If not, we can't help you.

AlexK-1 commented 2 months ago

Don't mix different things - dictionary (словарь) и example (примеры). That documentation talks about a dictionary. But okay, that's not the problem, we use custom reverse engineered API endpoints, the endpoint in the documentation requires API keys for authorisation.

I'm sorry that I got something mixed up. But in the example function of the YandexTranslate class there was a link to https://dictionary.yandex.net where there was a link to the documentation that I referred to in the first post.

Can you get the same behavior as you need (source language to source language) in Yandex Translate web application or Android application? If not, we can't help you.

Unfortunately, I couldn't reproduce it. I will try to find a solution to my problem outside of your library. Maybe you can give me some advice?

ZhymabekRoman commented 2 months ago

But in the example function of the YandexTranslate class there was a link to dictionary.yandex.net where there was a link to the documentation that I referred to in the first post.

The closest thing to the functionality mentioned in documentation is the dictionary function in translatepy.

Unfortunately, I couldn't reproduce it. I will try to find a solution to my problem outside of your library. Maybe you can give me some advice?

I make some changes to the translatepy to get working Yandex dictionary function and I get this response for specific words: hello:

{'head': {},
 'en': {'syn': [{'text': 'hello',
    'pos': {'code': 'nn', 'text': 'n'},
    'ts': 'həˈləʊ',
    'tr': [{'text': 'hi',
      'pos': {'code': 'nn', 'text': 'n'},
      'fr': 1,
      'syn': [{'text': 'hallo', 'pos': {'code': 'nn', 'text': 'n'}, 'fr': 1},
       {'text': 'salut', 'pos': {'code': 'nn', 'text': 'n'}, 'fr': 1}]},
     {'text': 'good day',
      'pos': {'code': 'nn', 'text': 'n'},
      'fr': 1,
      'syn': [{'text': 'good afternoon',
        'pos': {'code': 'nn', 'text': 'n'},
        'fr': 1}]},
     {'text': 'greetings',
      'pos': {'code': 'nn', 'text': 'n'},
      'fr': 1,
      'syn': [{'text': 'hullo', 'pos': {'code': 'nn', 'text': 'n'}, 'fr': 1},
       {'text': 'hiya', 'pos': {'code': 'nn', 'text': 'n'}, 'fr': 1},
       {'text': 'hey', 'pos': {'code': 'nn', 'text': 'n'}, 'fr': 1},
       {'text': 'howdy', 'pos': {'code': 'nn', 'text': 'n'}, 'fr': 1}]},
     {'text': 'greet', 'pos': {'code': 'vrb', 'text': 'v'}, 'fr': 1},
     {'text': 'yo', 'pos': {'code': 'inv', 'text': 'invar'}, 'fr': 1}]}]}}

regale:

{'head': {},
 'en': {'syn': [{'text': 'regale',
    'pos': {'code': 'vrb', 'text': 'v'},
    'ts': 'rɪˈgeɪl',
    'tr': [{'text': 'treat',
      'pos': {'code': 'vrb', 'text': 'v'},
      'fr': 1,
      'syn': [{'text': 'entertain',
        'pos': {'code': 'vrb', 'text': 'v'},
        'fr': 1},
       {'text': 'feed', 'pos': {'code': 'vrb', 'text': 'v'}, 'fr': 1},
       {'text': 'divert', 'pos': {'code': 'vrb', 'text': 'v'}, 'fr': 1}]},
     {'text': 'feast', 'pos': {'code': 'nn', 'text': 'n'}, 'fr': 1}]}]}}

Is this something you were looking for?

AlexK-1 commented 2 months ago

Is this something you were looking for?

No. I need to get examples of sentences with a certain word without having to translate it once again, as shown in the first code example in the first post. Maybe I said something wrong because I used a translator to write messages.


I also noticed that the Yandex API returns an example in the original language and its translation. Both would be useful to me, but the example function returns only one translated one. Can I get both sentences (original and translated) for one example sentence in the translatepy library or do I need to write my own function?

An example of a Yandex API response from the browser console: image

Animenosekai commented 2 months ago

I think the definition of the example function was ambiguous in the current version but should be well-defined in next

https://github.com/Animenosekai/translate/blob/6bd2237d6b491700ff93556f5b8775211642f2ed/translatepy/translators/base.py#L959-L975

https://github.com/Animenosekai/translate/blob/6bd2237d6b491700ff93556f5b8775211642f2ed/translatepy/models.py#L634-L733

This makes me think that RichDictionaryResult should be able to optionally hold examples too.

I just checked on the next branch, and it seems that no translator returns an example, might not be reimplemented yet…

As for the current stable version, those functions are hit are miss and are here because some translators such as Yandex are DeepL supported some kind of “example” feature, but which seemed to have different behaviors following the website used.

[!NOTE]
For example, the use of a destination_language doesn't feel right in this context, which is corrected in the new branch.

It might also be worth diving a bit more in the current web implementations of the example feature in supported websites.

AlexK-1 commented 2 months ago

As a result, I wrote my own simple function to get a list of sample sentences using your exceptions. I'll use it for now.

import requests
from translatepy.translators.yandex import YandexTranslateException
from translatepy.translators import BaseTranslator

def get_examples(text: str, source_language: str, destination_language: str) -> list:
    link = (f"https://dictionary.yandex.net/dicservice.json/queryCorpus?ui=en&"
            f"src={text}&lang={source_language}-{destination_language}&flags=7&srv=android&v=2&maxlen=200")
    BaseTranslator._validate_language_pair(None, source_language, destination_language)
    request = requests.get(link)
    if request.status_code >= 400:
        raise YandexTranslateException(request.status_code, request.json()["message"])
    result = []
    for example in request.json()["result"]["examples"]:
        result.append({
            "src": example["src"].replace("<", "").replace(">", ""),
            "dst": example["dst"].replace("<", "").replace(">", "")
        })
    return result