ailabitmo / foodpedia

FOODpedia - a DBpedia of Food Products
http://foodpedia.tk
7 stars 3 forks source link

Translate names of the products to English using Yandex.Translate API #25

Open KMax opened 9 years ago

KMax commented 9 years ago

One of our goals is to provide multilingual support. One of step to achieve this goal is to automatically translate names of the products using an external API, such as Yandex.Translate or any other.

An example:

Of course automatic translation isn't as good as translation provided by the manufacturer, but at this moment it's better than nothing.

KMax commented 9 years ago

Also why not to translate the descriptions of the products? :)

chistyakov commented 9 years ago

Unfortunately, yandex.translate API is limited:

the volume of the text translated: 1,000,000 characters per day but not more than 10,000,000 per month. http://legal.yandex.ru/translate_api/

1,000,000 characters per day is not sufficient for translating names and descriptions.

KMax commented 9 years ago

I misunderstood this limit, I saw "requests" instead of "characters" (facepalm)

chistyakov commented 9 years ago

The limit allows us to translate names only for ~20,000 items.

I see few possible options:

KMax commented 9 years ago

As a workaround, I suggest to write a script that queries the names and descriptions which don't have translations yet and translates them with the API till faces the limit. The script can be started manually, so we could run it once a day till the all products are translated.

Also the script could write the translated triples in a file, so we could reuse the translations later.

In future, we shouldn't update the whole dataset, but only changed pieces, therefore we may won't exceed the limit.

m-lapaev commented 9 years ago

Do we really need Russia-specific products to be translated? All products sold here can't be found outside Russia without export from Russia and import procedures afterwards which includes label translation (something like a white sticker with description in language of destination country). Moreover, proper nouns and nominals are normally not translated. They're transliterated, transcribed or undergo a loan translation instead, which won't be implemented by yandex api. It's sensible to store descriptions of food stuff in native language, thus Russian products should have descriptions in Russian (if no other manufacturer description is provided) and imported goods should be described in manufacturer country language. Otherwise we'll get something like Иван -> John.

chistyakov commented 9 years ago

Maxim(@m-lapaev), from my pov, the goal is to allow non-Russian foodpedia's users to understand at least something, when they open a page with product. We want to publish an article to non-Russian journal and it looks awkward, that we show everything on Russian

Maxim(@KMax), in other words, we can organize separate updatable vocabulary (dataset) with ru-en translations. That may works.

KMax commented 9 years ago

@chistyakov is right, we know that the machine translations won't be as good as we would like, but it's much better than nothing.

m-lapaev commented 9 years ago

So, actualy the point is to present contents, categories and other data in English, but not food names. Just imagine a translation of some food name into Russian, let's say, German beer "Berliner" --> "Берлинец". We have a risk to produce something like [1] or [2].

  1. http://i1.i.ua/prikol/thumb/5/8/755885.jpg
  2. http://i1.i.ua/prikol/pic/0/1/144210.jpg
KMax commented 9 years ago

@m-lapaev if you look at the first message in this thread, you will see what do we mean by "name". And I accept the risk to have really bad translations for some products.

chistyakov commented 9 years ago

partially translated dump is uploaded to production: http://foodpedia.tk/page/4600209002117

only first 20,000 items were translated

chistyakov commented 9 years ago

example of bad translation: http://foodpedia.tk/page/8436018292830 http://foodpedia.tk/page/4607105861152