Closed PeterBowman closed 3 years ago
Possible solution: parse the <normalized>
element if present and use that information instead of normalize()
to link query results with input titles. I'd implement some sort of resolveNormalizedParser()
helper method (analogous to resolveRedirectParser()
) for that matter. The existing normalize()
method would be explicitly documented to serve limited offline-based title normalization purposes, remarking that it's not fully aware of certain quirks (such as gender aliasing) for obvious reasons.
Bonus: solving this would also solve https://github.com/MER-C/wiki-java/issues/162.
@MER-C are you OK with this proposal? I'd be happy to work on a patch if so.
Sounds good.
On some non-English language projects, a dedicate user namespace prefix alias is assigned to users that choose to pick female gender in their preferences. For instance, on plwiki male/unspecified gender users get the default Wikipedysta prefix, whereas female ones are identified with Wikipedystka (cf. Benutzer/Benutzerin on German projects, Usuario/Usuaria on Spanish wikis and so on).
Wiki.java automatically falls back to the default/male language-specific prefix upon normalization. It is not different from other normalization use cases, i.e. (for plwiki)
User->Wikipedysta
,wikipedysta->Wikipedysta
,Wikipedystka->Wikipedysta
. However, MediaWiki honors the gender setting when a user page is queried.Let's query w:pl:User:Cancre (on-wiki displayed as Wikipedystka:Cancre, female prefix alias) and also w:pl:User:Przykuta (Wikipedysta:Przykuta, male/default prefix) just for comparison (api.php):
Wiki.java expects the normalized page name to also fall back to the male/default prefix (Wikipedysta:Cancre). It can't find it in the
pages
array, though, because of the special treatment of gender aliases in this specific namespace. Example:Result (first line refers to User:Cancre):
Reason: Wiki.java calls
normalize()
internally and reorders the query results according to the input titles. Thisnormalize()
method does not take into account the gender of the underlying user a user page refers to. The following scheme can be found in several places, e.g.getPageInfo()
:https://github.com/MER-C/wiki-java/blob/c8cc5a1d24911a189773aadcfb14b4a58edb4e23/src/org/wikipedia/Wiki.java#L1754-L1763
Since
getPageInfo()
is always called internally byedit()
, this bug makes it impossible to edit user pages prefixed with female aliases on gender-aware language wikis.