Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
697 stars 132 forks source link

Perhaps it would be a good idea to not include all possible languages on the dropdown select language, but to have members wishing to see all the languages opt in. #995

Closed ckjpn closed 8 years ago

ckjpn commented 8 years ago

Perhaps it would be a good idea to not include all possible languages on the dropdown select language menu for every member every time it is used. (In the search form, when the flag is clicked and when the translate button is clicked.)

This might possibly speed things up, especially for search results and pages with 20 to 100 sentences, since the dropdown select menu is included 2 times for each sentence (flag and translate button), and 2 times for the search form. (This means a page of results with 100 sentences has this code 202 times in the page.)

One possible idea would be to have all languages with fewer than 100 sentences not be displayed by default, but only displayed to members who opted in with a choice in their settings. We are getting very few contributions in the languages with fewer than 100 sentences, and not really very many in languages with fewer than 500 sentences

I don't know if this would help, or if it would be too hard to implement. However, it might be an idea to consider. As the number of languages grow, it's likely that we will have many more than 283 languages in the future, so coming up with some kind of solution will likely be something that eventually needs to be considered.

If you chose 100 as the break point, then only 140 lines would be in the select menu, rather than 287.

\ 140 languages have over 100 contributions English, Esperanto, Turkish, Italian, Russian, German, French, Spanish, Portuguese, Japanese, Hungarian, Hebrew, Berber, Polish, Macedonian, Finnish, Dutch, Chinese (Mandarin), Marathi, Swedish, Ukrainian, Danish, Bulgarian, Serbian, Interlingua, Czech, Arabic, Low Saxon, Lithuanian, Persian, Latin, Klingon, Lojban, Greek, Tagalog, Norwegian (Bokmål), Icelandic, Tatar, Hindi, Indonesian, Vietnamese, Toki Pona, Uyghur, Belarusian, Romanian, Catalan, Ido, Azerbaijani, Interlingue, Galician, Meadow Mari, Shanghainese, Croatian, Cantonese, Kotava, Occitan, Basque, Korean, Estonian, Kazakh, Ilocano, Chavacano, Slovak, Afrikaans, Bengali, Literary Chinese, Urdu, Malay, Irish, Volapük, Kapampangan, Latvian, Chuvash, Old East Slavic, Cebuano, Cornish, Khmer, Central Dusun, Yiddish, Breton, Kalmyk, Khasi, Armenian, Yakut, Old Prussian, Upper Sorbian, Zaza, Lower Sorbian, Malayalam, Norwegian (Nynorsk), Georgian, Slovenian, Algerian Arabic, Uzbek, Piedmontese, Scottish Gaelic, Guarani, Thai, Albanian, Mongolian, Egyptian Arabic, Kurdish, Swahili, unknown, North Moluccan Malay, Welsh, Bosnian, Maori, Coastal Kadazan, Ottoman Turkish, Waray, Crimean Tatar, Kashubian, Ho, Tamil, Javanese, Faroese, Chukchi, Turkmen, Awadhi, Quechua, Ossetian, Nahuatl, Amharic, Ancient Greek, Punjabi (Eastern), Kannada, Hawaiian, Novial, Emilian, Maltese, Xhosa, Bashkir, Old English, Sanskrit, Quenya, Frisian, Asturian, Kyrgyz, Picard

\ 143 languages have fewer than 100 contributions Shuswap, Nogai, Telugu, Choctaw, Ladin, Hill Mari, Sundanese, Tok Pisin, Tigrinya, Luxembourgish, Gujarati, Sumerian, Zulu, Tajik, Mambae, Pipil, Ladino, Hausa, Tibetan, Karachay-Balkar, Walloon, Lingua Franca Nova, Laz, Somali, Haitian Creole, Hiligaynon, Guerrero Nahuatl, Shona, Pangasinan, Bhojpuri, Lingala, Central Huasteca Nahuatl, Sindarin, Lao, Sinhala, Burmese, Iraqi Arabic, CycL, Punjabi (Western), Garhwali, Nigerian Fulfulde, Malagasy, Yoruba, Fiji Hindi, Old Tupi, Nepali, Livonian, Niuean, North Levantine Arabic, Pennsylvania German, Dungan, Chamorro, Abkhaz, Kölsch, Lakota, Ewe, Kekchi (Q'eqchi'), Cherokee, Khakas, Pashto, Navajo, Chinyanja, Northern Sami, Romansh, Tarifit, Greenlandic, Udmurt, Ngeq, Erzya, Kumyk, Ainu, Tongan, Sicilian, Kinyarwanda, Wolof, Aragonese, Middle English, Venetian, Orizaba Nahuatl, Mohawk, Lombard, Võro, Gilbertese, Tuvaluan, Chechen, Tuvinian, Bambara, Gagauz, Friulian, Kamba, Buryat, Mon, Juhuri (Judeo-Tat), Papiamento, Romani, Samoan, Karakalpak, Middle French, Tetun, Scots, Manx, Sango, Pulaar, Min Nan Chinese, Southern Altai, Adyghe, Fijian, Hmong Njua, Bodo, Moksha, Tsonga, Tagal Murut, Palatine German, Balinese, Sardinian, Iban, Keningau Murut, Corsican, Sindhi, Old Aramaic, Hakka Chinese, Aymara, Southern Sotho, Moroccan Arabic, Ojibwe, Palauan, Setswana, Swazi, Assyrian, Tokelauan, Umbundu, Assamese, Old Saxon, Old Norse, Odia (Oriya), Marshallese, Seychellois Creole, Cuyonon, Igbo, Urhobo, Louisiana Creole, Swiss German, Luganda

Perhaps you may think 100 is too high or too low a number for the break point.

When choosing a breaking point, remember that it's not uncommon for our regular contributors to contribute over 100 sentences a day. Even regular members who don't contribute much, often go over 10 sentences on any given day.

It would only take one or two translation sessions by a member for any language below the breaking point to move into the range of being on the regular select menu.

jiru commented 8 years ago

This might possibly speed things up

Any other benefits? At the moment, speed isn’t much of a problem. What’s the problem with having all the possible languages in the dropdown?

I’m not rejecting your idea, but I think we need to identify more clearly the problem you’re trying to solve. If we don’t know the problem, we don’t know if we solved it. If we don’t know whether we solved it, we can never close this ticket.

If you’re trying to start a discussion, maybe the mailing list is more appropriate.

ckjpn commented 8 years ago

Another advantage is that people won't have to scroll past so many languages looking for languages that we actually have a lot of examples for.

If nobody thinks this is an idea to consider, please feel free to close it.

jiru commented 8 years ago

If nobody thinks this is an idea to consider, please feel free to close it.

My problem is that I just don’t know whether this is an idea to consider, because I don’t know if it’s worth implementing, because I don’t know what problem it is trying to solve. You seem to take time to write very developed enhancement requests, and that’s a pity we can’t consider them because you just don’t describe the original problem that initially pushed you to suggest it. I’m marking this issue as won’t fix until you do so.

Trang also explained how to report issues and enhancements requests.

sacredceltic commented 8 years ago

also, we should consider that a long list of available languages is part of Tatoeba's attraction. I would recommend not filtering this list on Tatoeba's main page, by default.