Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
707 stars 132 forks source link

In the advanced search give "Owned by a self-identified native: " 3 options #2260

Open ckjpn opened 4 years ago

ckjpn commented 4 years ago

In the advanced search give "Owned by a self-identified native:" 3 choices.

Owned by a self-identified native: Any / Yes / No

... in the same way these are given 3 choices.

Is orphan: Any / Yes / No

Has audio: Any / Yes / No

Also add that same option on the "translations" side, too

Just like the other 2.

Why?

This would give proofreaders an easy way to find non-native sentences to proofread.

Possibly, there would be other uses, too.

Related Wall Post

I have a suggestion. Since natives are tagged, so the sentences in a language added by non-natives can be automatically added to a list, which can be reviewed by willing natives. What do you think of my idea?

By Smoky (2020-04-07) https://tatoeba.org/eng/wall/show_message/34741#!#message_34741

ckjpn commented 4 years ago

Additionally, it would be great if you could do the following search.

  1. Look for non-native sentences in a certain language.
  2. ... that are not yet on a certain list (List 907 for example)
  3. ... and/or not yet rated OK by a certain member (myself for example).
  4. ... and/or not yet rated OK by any member.

1 & 4 together could likely be done if you also added the following option to the advanced search. The other possibilities might be harder to add.

Rated OK by at least 1 member: Any / Yes / No

Related issue: https://github.com/Tatoeba/tatoeba2/issues/2261

alanfgh commented 4 years ago

See my comments on that wall thread.

agrodet commented 4 years ago

Related, same request : #1663

I'm on Alan's line for all the considerations about "native, good; not native, not good". However, I don't see, for now at least, potential harm of implementing this particular feature. As long as we don't make it the default of don't promote means detrimental to contributors, I see it as more filtering options.

To be fair, I should mention that I don't necessarily see the benefits of having it neither (except freedom of search and exploration). But the cost of implementing it should be low enough to be acceptable.

jiru commented 4 years ago

the cost of implementing it should be low

While it may be true for the title of the issue, it’s not when it comes to everything else that CK mentioned.

The problem I see with the proposed solution is that it’s one more step into "throwing every Tatoeba functionality into a search option". Really, there are now so many things one can do with the advanced search. On the one hand, it has become a powerful tool (anybody remembers the old times when we had not option outside the top bar?), but on the other hand, it has become only usable for power users (or let’s say "adventurous"). If "let’s add a new search option" is the only answer we can give to a use case, it means we are ignoring the majority of people using Tatoeba.

It is tempting to implement CK’s suggestion because it’s feels easy and "fitting" into the existing picture, but I see it as a trap I myself fall into many times. The direct consequence of that trap, in my opinion, is that non-power-users never stick while geeks do. Our major contributors all are super-geeks and our community lacks diversity. Maybe I’m extrapolating, but I really think this pattern harms us at the end of the day.

So, I think we should think a bit more about the "proofread" use-case to come up with a more inclusive solution. For example, what about a page that gives you a list of sentences to proofread, and in front of each sentence there is an icon that tells you if the sentence was contributed by a native speaker? Wait, we already have such icon, don’t we?

trang commented 4 years ago

But the cost of implementing it should be low enough to be acceptable.

I just want to say that I will oppose to the implementation of any new feature in the search (small or big) until we have refactored the code and added unit tests. So you'd have to count the cost of this refactoring, which I think is not so low :P

Plus, being able to search sentences from non-native speakers isn't really going to achieve much by itself. Even if the cost would be low, the benefits would be even lower.

It's pretty clear that this issue and #2261 are about a lack of good proofreading functionalities in Tatoeba. And it's a quite important issue to solve. Proofreading is an essential activity to achieve a corpus of high quality. So we should really focus on that instead: what would be a good user experience and an efficient process for proofreading?

I suggest closing this issue, as well as #2261. We can create a new one for proofreading.

ckjpn commented 4 years ago

As a contributor, I could indeed prioritize sentences from non-native speakers when I'm searching sentences to proofread, but how do I avoid proofreading sentences that other people (or myself) have already proofread?

Also include a way to search for sentence that don't yet have an OK rating, assuming proofreaders would add an OK rating to ones they thought were good. Or, if you don't want the possibility of reading sentences you or others have rated at all, don't limit this filter to just sentences with OK ratings, but with any rating.

agrodet commented 4 years ago

@jiru I agree with you when it comes to how the advanced search shouldn't be a monster that nobody can understand fully. However I don't think it is the case, not even near. If you really think that only power-users can use the advanced search, then we should add this article https://en.wiki.tatoeba.org/articles/show/advanced-search# beside "More search options" link in the advanced search. Or instead. Or merge both, I don't know.

I don't think one needs to be a power-user to understand this text. I also don't think that advanced search complexity is the main reason (or the second main, or even the third main reason) that users don't stick.

I agree, that a complex functionality is not new-user friendly nor "my aunt Emma"-user-level friendly. However, compromise has to be made somewhere, and isn't it why we have a simple search? (that could / should be improved, certainly)


Since we have been talking about a proper proofreading page(s), I have asked corpus maintainers feedback on the Wall. Hopefully, this will give us enough information to build something useful and pertinent. I'll post a summary on GitHub.

ckjpn commented 4 years ago

I think having many possibilities, with a "more search options" added to the advanced search page would be a good idea.

You could even not offer quite so many options to the first level of the advanced search.

We could possibly have several different pages with different types of slightly preset advanced searches.

Not all options would need to be shown, but could still be available for power-users to use to create their own forms.

Some cut-down advanced search forms I already have online

At the top of this page, you can see 2 cut-down advanced search forms. http://tatoeba.ueuo.com/

Also see other possibilities at the top of this page. http://study.aitech.ac.jp/tatoeba/translate/links.php?f=jpn&t=fra

Or on the top right of this page. http://big.rf.gd/vocab/66.php?t=fra

Possible additional "hidden" options ideas that could be added

&rating=any (default) &rating=ok &rating=unsure &rating=not+ok (or use 1,0,-1,null)

&trans_native= &trans_tags= &trans_rating=

Even if these aren't shown on the "advanced search" form, they could be useful for power-users.

jiru commented 4 years ago

I also don't think that advanced search complexity is the main reason (or the second main, or even the third main reason) that users don't stick.

@agrodet I’m not saying that complexity is the reason. I’m saying that the pattern of building new functionalities by adding more advanced search options is harmful.

I agree, that a complex functionality is not new-user friendly nor "my aunt Emma"-user-level friendly. However, compromise has to be made somewhere, and isn't it why we have a simple search? (that could / should be improved, certainly)

@agrodet I don’t see the simple search as a user-friendly equivalent to the advanced search, because the advanced search is used to build functionality that has little to do with search, such as proofreading or translating. If the "compromise" is not to offer any of these "advanced search functionalities that are not search" to your aunt, then it’s no compromise, it’s exclusion.

Since we have been talking about a proper proofreading page(s), I have asked corpus maintainers feedback on the Wall. Hopefully, this will give us enough information to build something useful and pertinent. I'll post a summary on GitHub.

Thank you very much. :blush: