Open desb42 opened 5 years ago
Thanks for the detail. I think this is going to be a hard problem.
If I remember correctly, MediaWiki uses StripState to ignore <ref> (and other xml nodes) during the first pass. XOWA does not which leads to odd cases when you have ref tags inside template expressions (anything inside {{{`` and
}}}```).
I'll triage this a little more later, but it could be a while. Are you seeing a lot of these errors?
I am building enwiki 2019-06-01 at the moment and was looking at some of the error output when I stumbled across this page en.wikipedia.org/wiki/Kay_Musical_Instrument_Company The wikitext of interest is quite a long way down the page (search for
k573>
) A snippet: (only showing first image of the gallery)The current processing (in Gallery_parser.java) assumes a single line So only the line
is processed
Strictly, the inner sections should be processed first That is, if the \s where parsed first (or at least tokenised) this would appear as one line