Open Guybrush88 opened 11 months ago
I suspect that if this were to be done, it might be best to not do this in real time, but to generate the number of direct links a sentence has only from time to time -- perhaps once a week, before the weekly downloadable files are created, and then also create a downloadable file with these numbers.
(1) For people who intend to translate or to find sentences with most translations could be useful, these sentences could be sometimes among the most popular/universal or the most easy sentences.
From my experience, I've learned that the most linked sentences of a language are primarily those that are:
Surfacing the sentences with a high number of translations would reinforce these biases.
(2) For people who intend to translate or to find sentences with few translations could be useful, these sentences could be sometimes among the most "virgin" sentences, or the less noisy sentences, etc.*
I doubt that there are many translators out there looking for these "virgin" sentences.
(3) Combining this criterion with some already present criteria could be very useful for the user to localize good sentences.
I don't think an extra filter is the proper way to help translators find better sentences to translate. Rather, we should measure the relative number of translators for a sentence compared to its closest peers of the same language, age and length. And then we could use this popularity score as a sorting option for the advanced search.
I, too, sort of doubt this would be all that useful, for the reasons mentioned above.
As for "virgin" sentences, those with no translations, these can already be found using the "exclude", "any language", and "direct link" or "any link" options.
Template (pre-filled form): https://tatoeba.org/en/sentences/advanced_search?&trans_filter=exclude&trans_link=&sort=random
Currently 1,827,697 occurrences 15.5% of our sentences 1,827,697/11,779,865
As reported by cojiluc on the wall:
https://tatoeba.org/it/wall/show_message/40365#!#message_40365