Use different logic for lists processing

gnosygnu / xowa

xowa offline wiki application

Other

378 stars 40 forks source link

Use different logic for lists processing #492

Open desb42 opened 5 years ago

desb42 commented 5 years ago

As part of #417 I mentioned that the \

\
logic did no seem right I have been looking at mediawiki\includes\parser\BlockLevelPass.php to see how its done and have a suggested change in my branch lists_new My idea was to make the tokeniser just count the number of list elements (hence no limit on how many elements) and get the html generator to do the work As xowa has tokenised the various html elements the complexity in the php code (with all the regexes) is unnecessary.

gnosygnu commented 5 years ago

Thanks for the commit. I took a quick look at it now, but Github clobberred the diff: https://github.com/desb42/xowa/commit/af19cf83209ba765fa6be0157e23118856a8ac70

The main part seems to be

        // Multiple prefixes may abut each other for nested lists.
        while (cur_pos < src_len) {
            byte b = src[cur_pos];
            if (b == Byte_ascii.Star || b == Byte_ascii.Hash || b == Byte_ascii.Semic || b == Byte_ascii.Colon) {
                            cur_pos++;
                        }
            else
                break;
        }

Let me look at it a little more later

desb42 commented 5 years ago

Its slightly more than that. There is a big section in Xoh_html_wtr.java delimited by

// -------------------------------

where most of the work is done

gnosygnu commented 5 years ago

Ah, missed that. It looks like you ported all the code in https://github.com/wikimedia/mediawiki/blob/master/includes/parser/BlockLevelPass.php#L190

Which is pretty cool. That's what I was planning to do, and will ultimately be the direction of all XOWA parser code (abandon the custom DOM structure and replicate what MediaWiki does, only in Java)

gnosygnu commented 5 years ago

[Sorry, premature comment]

I actually tried reproducing a lot of the code. The above part is here already: https://github.com/gnosygnu/xowa/blob/master/gplx.xowa.mediawiki/src/gplx/xowa/mediawiki/includes/parsers/XomwBlockLevelPass.java#L195 . There are a bunch of similar parallel code blocks in gplx.xowa.mediawiki. I just haven't integrated them yet into the main XOWA project

This is something I'd like to do, but I'm still a little wary about changing too much at the moment. Let me think about doing some incremental replacements and seeing if I can co-opt some parts.

Thanks.