addshore / wikicrowd

Tool for crowd sourced micro edits for Wikimedia
https://wikicrowd.toolforge.org/
MIT License
7 stars 4 forks source link

Proposed alias already exists #43

Closed waldyrious closed 2 years ago

waldyrious commented 2 years ago

Not sure what's the issue here. I came across this in a question for Q10838:

Screenshot from 2022-04-02 15-54-56

addshore commented 2 years ago

Interesting...

So the wiki correctly lists the BAB 70 alias as already existing https://en.wikipedia.org/w/api.php?action=query&prop=extracts|pageterms&exlimit=20&exintro=1&titles=Bundesautobahn_70

And we can see that wikicrowd is aware of this as BAB 70 appears in the current aliases too...

Any alias candidates that already exist should be skipped and not have questiosn generated for them https://github.com/addshore/wikicrowd/blob/454ff0fdc5534c98deccf68c48ed7fb04787de55/app/Jobs/GenerateAliasQuestions.php#L116-L126

Looking at the question itself in the DB it appears this is due to the space actually being a \u00a https://www.fileformat.info/info/unicode/char/000a/index.htm when extracted from Wikipedia...

>>> Question::where('unique_id', '=', 'Q10838/aliases/en/BAB 70')->take(1)->get()
=> Illuminate\Database\Eloquent\Collection {#4427
     all: [
       App\Models\Question {#6604
         id: 56283,
         question_group_id: 13,
         unique_id: "Q10838/aliases/en/BAB 70",
         properties: "{"item":"Q10838","label":"Bundesautobahn 70","aliases":["BAB 70","A 70","Autobahn 70"],"suggestion":"BAB\u00a070","html_context":"<p>English Wikipedia:<\/p><\/br><p>English Wikipedia:<\/p><\/br><p>English Wikipedia:<\/p><\/br><p><b>Bundesautobahn\u00a070<\/b> (translates from German as <i>Federal Motorway\u00a070<\/i>, short form <b>Autobahn\u00a070<\/b>, abbreviated as <b>BAB\u00a070<\/b> or <b>A\u00a070<\/b>) is an autobahn in southern Germany, connecting the A\u00a07 via Schweinfurt and Bamberg to the A\u00a09. <\/p>","language":"en"}",
         created_at: "2022-03-29 10:41:40",
         updated_at: "2022-04-02 17:01:59",
       },
     ],
   }