apertium / apertium-apy

📦 Apertium HTTP Server in Python
https://wiki.apertium.org/wiki/Apertium-apy
GNU General Public License v3.0
32 stars 42 forks source link

Either Wiki or Apertium has a Bug #158

Closed Ryu945 closed 4 years ago

Ryu945 commented 4 years ago

In the wiki, it says Malaysian (ms or msa) is a released language. It also says Norwegian (nor) is a released language. I updated my server instance tonight and checked the listed pairs and there is not a single pair that uses either of these.

https://wiki.apertium.org/wiki/Main_Page

TinoDidriksen commented 4 years ago

nor is a macrolanguage identifier comprised of nno and nob, both of which are used.

msa is also a macrolanguage. The actual language is zlm, which is used in 1 pair.

Ryu945 commented 4 years ago

How does one search what language pairs exist for these macrolanguages if it cannot be done in /listpairs ? It seems like anyone using the interface has to hardcode them in and cannot poll the server for valid pairs.

TinoDidriksen commented 4 years ago

You should go from code to language, not from language to code. E.g., if you see zlm in /listPairs then check https://iso639-3.sil.org/code/zlm which will tell you it's Malay. If you go the other way and look up Malay, you'll likely wind up on https://iso639-3.sil.org/code/msa which doesn't help you much.

Ryu945 commented 4 years ago

My point was how to do it with a server interface and not have to scrape a website. Other then scraping https://iso639-3.sil.org/code/zlm or hardcoding it in; is there any other way to do it? This is software looking at what pairs are valid and what pairs are not valid.

ftyers commented 4 years ago

I don't understand the problem. Is it that you would like the English name of a given code? e.g. like on IRC you can query begiak:

21:13 <+spectie> .iso639 zlm 
21:13 <begiak> zlm = Malay

?

TinoDidriksen commented 4 years ago

@Ryu945 the /listLanguageNames endpoint will show you all the names we have scraped code->name mappings for, and can even be queried in multiple languages for some of them. But look in the APy code to see how it specifically works.

Ryu945 commented 4 years ago

I don't understand the problem. Is it that you would like the English name of a given code? e.g. like on IRC you can query begiak:

21:13 <+spectie> .iso639 zlm 
21:13 <begiak> zlm = Malay

?

Software has to do this. A human isn't the one checking some wiki to see the equivalent letter code. I don't plan on using any scraping type solution which only leaves hard coding it in as I don't see any API to check for macro-language pairs. I am asking if there is any API solution I can run to check if a pair for a macro-langauge is valid.

@Ryu945 the /listLanguageNames endpoint will show you all the names we have scraped code->name mappings for, and can even be queried in multiple languages for some of them. But look in the APy code to see how it specifically works.

curl 'http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk'
{"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"}

That API appears to only find the name of the specified language using the specified host language which is not what I am after. I am after not specifying the language but finding valid pairs whether they are regular language pairs or macro-language pairs. I can either do this in person by looking through the data and hard coding it into the software. What I prefer to do is have the software able to poll the server for information needed. I initially thought I could use /listPairs but that does not cover macro-langauges.