Open jharpster opened 6 years ago
I notice that the connected MapRoulette Challenge has a very high number of tasks marked as Not an Issue (false positive). As the MapRoulette superuser I am getting some complaints about the tasks in this Challenge. I would recommend that we disable this MapRoulette Challenge until the quality of the filter can be improved. Thanks.
One of glaring issues is that MapRoulette Challenge is not listing what is supposed to be a profanity.
So I have no idea is it a complete bug, pattern matching English profanities to text in other languages or something else.
Looking at it I am unable to spot what caused it to be reported, not sure which English profanity matched here. I have not seen a single valid report in Poland.
@mvexel Thanks for the feedback. I have stopped to update this challenge.
@matkoniecz I'll evaluate the possibility of improving or disabling the profanity filter on the next few days.
@willemarcel
I too stumbled on this problem on MapRoulette
So I went digging and figured the following things out. Some of this is probably obvious if you are familiar with OSM and the code around it. I wasn't :smile:
name
) are checked against the word lists for multiple languages (default: en / es / de / fr / ru / zh)
name:es
is only checked against the spanish word listtrue/false
. But I don't know how that fits into the rest of the tech stackThen I checked the word-lists in all languages I understand
The many false positives are caused by the combination of the above findings.
Some examples of what currently happens
name
tag where the value contains the number 13
or the name Peter
will be flagged as profanity. Not by chance the screenshot of @matkoniecz is something with "13A" in the name
tag
13
is in the chinese word list (ZH)Peter
is in the french word list (FR)cappella
(italian for chapel) is on the italian word listThanks for the feedback. I have stopped to update this challenge.
Would it be possible to take it down completely or archive?
https://maproulette.org/browse/challenges?query=profanity
It would be worth saving time on manual marking 2800 entries as invalid by people using MR.
Expand the profanity filters and make them multi-lingual.
Brief Description
The existing word list is inadequate to address more than the simplest profanities.
What is the motivation / use case for this feature?
Create more robust vandalism detection
What is the expected behaviour ?
Consider incorporating a broader list of profanities from this list.