Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
704 stars 132 forks source link

Allow filtering out / excluding of search results based on tag #2889

Open dshepsis opened 2 years ago

dshepsis commented 2 years ago

Story or Context Earlier, I was looking through sentences. I found a few sentences which included common swear words, so I commented that these tags should be labeled as "Vulgar". CK commented in response, correctly pointing out that this was not a particularly useful thing to do, since Tatoeba only allows filtering for search results that contain a tag, but not results which do not contain a given tag. With a tag like "Vulgar" we would expect far more users to be interested in excluding tagged sentences from search results, rather than only looking for Vulgar sentences.

CK wrote:

"That said, maybe in the future tatoeba.org will make it possible to filter out search results by tags."

I decided to check and couldn't find a relevant issue in this repository, but I may have missed it.

Idea There should be some functionality for excluding tags from search results. Under the "Advanced Search" settings, there is already a textbox labelled "Tags", which allows users to enter a comma-separated list of tags, of which all must match for a given sentence to appear in the search results. For example, entering "male name,4 syllables" produces a list of sentences which have both 4 syllables and a male name.

I propose adding an additional textbox, labelled "Exclude tags", which also tags a comma-separated list. A given sentence must not match any of the listed tags to be included in the search results. For example, entering "vulgar,slang" would exclude both sentences tagged "vulgar", and also sentences tagged "slang".

This could have a variety of uses. Excluding vulgar terms is useful to, say, educators trying to avoid teaching students swears. Another use might be to exclude the tags "proverb" and "idiom", if one is specifically trying to find a literal usage for a given term. I am sure there are other possible uses as well, especially for the @ tags which are used to request actions/changes.

LBeaudoux commented 2 years ago

@dshepsis Thank you for highlighting this issue that users often face. It reminds me of a similar issue concerning the filtering of sentence owners.

I think your suggestion could be improved a bit. We could avoid adding an extra field by allowing tags with a minus sign in the current field. For example, we could filter sentences tagged as idiom but not tagged as vulgar with idiom,-vulgar.

dshepsis commented 2 years ago

For example, we could filter sentences tagged as idiom but not tagged as vulgar with idiom,-vulgar.

I appreciate the desire to keep the form simpler. I did consider combining the fields when I imagined how this feature may work, but felt that adding syntax may result in confusion in some cases. For example, there are already several tags which start with a minus sign, such as "-as". We could choose another character, but as far as I can tell there are few limits to the valid names for tags, so we would need some escape sequence.

In fact, while writing this, I realized that commas are also legal in tag names, which makes it impossible to properly search by tags that contain them. So I guess we need an escape sequence for that anyways. Nevertheless, I believe it would be simpler for most users to have an extra textbox which has a label and caption to explain its exact functionality.

dshepsis commented 2 years ago

I've created a separate bug report with respect to it being impossible to search by tags containing commas: https://github.com/Tatoeba/tatoeba2/issues/2890.

ckjpn commented 1 year ago

Related wall post: https://tatoeba.org/en/wall/show_message/39095#message_39095