bryonjacob / wikimodel

Automatically exported from code.google.com/p/wikimodel
0 stars 0 forks source link

XHTML Parser problems parsing lists #22

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Hi,

I have several problems with the XHMTL parser and lists. Take the following
input:

<html><ol><li>Item 1<ol><li>Item 2<ul class="star"><li>Item
3</li></ul></li><li>Item 4</li></ol></li><li>Item 5</li></ol><ul
class="star"><li>Item 1<ul class="star"><li>Item 2<ul class="star"><li>Item
3</li></ul></li><li>Item 4</li></ul></li><li>Item 5</li><li>Item
6</li></ul></html>

It generates the following events:

beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [1]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [2]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [3]
endListItem
endList: [BULLETED]
endListItem
beginListItem
onSpace
onWord: [ ]
onWord: [4]
endListItem
endList: [BULLETED]
endListItem
beginListItem
onSpace
onWord: [ ]
onWord: [5]
endListItem
endList: [BULLETED]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [1]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [2]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [3]
endListItem
endList: [BULLETED]
endListItem
beginListItem
onSpace
onWord: [ ]
onWord: [4]
endListItem
endList: [BULLETED]
endListItem
beginListItem
onSpace
onWord: [ ]
onWord: [5]
endListItem
beginListItem
onSpace
onWord: [ ]
onWord: [6]
endListItem
endList: [BULLETED]

However this is not correct. It should be:

beginList: [NUMBERED]
beginListItem
onWord: [Item]
onSpace
onWord: [1]
beginList: [NUMBERED]
beginListItem
onWord: [Item]
onSpace
onWord: [2]
beginList: [BULLETED]
beginListItem
onWord: [Item]
onSpace
onWord: [3]
endListItem
endList: [BULLETED]
endListItem
beginListItem
onWord: [Item]
onSpace
onWord: [4]
endListItem
endList: [NUMBERED]
endListItem
beginListItem
onWord: [Item]
onSpace
onWord: [5]
endListItem
endList: [NUMBERED]
beginList: [BULLETED]
beginListItem
onWord: [Item]
onSpace
onWord: [1]
beginList: [BULLETED]
beginListItem
onWord: [Item]
onSpace
onWord: [2]
beginList: [BULLETED]
beginListItem
onWord: [Item]
onSpace
onWord: [3]
endListItem
endList: [BULLETED]
endListItem
beginListItem
onWord: [Item]
onSpace
onWord: [4]
endListItem
endList: [BULLETED]
endListItem
beginListItem
onWord: [Item]
onSpace
onWord: [5]
endListItem
beginListItem
onWord: [Item]
onSpace
onWord: [6]
endListItem
endList: [BULLETED]

There are 2 problems:
* the "Item" word is swallowed
* it doesn't support ordered lists

Thanks

Original issue reported on code.google.com by vmas...@gmail.com on 20 Jun 2008 at 11:55

GoogleCodeExporter commented 8 years ago

Original comment by mikhail....@gmail.com on 29 Jun 2008 at 11:19

GoogleCodeExporter commented 8 years ago
Here's a patch that fixes the problem (also needs the patch from issue 21 I 
think).

Original comment by vmas...@gmail.com on 19 Jul 2008 at 5:32

Attachments:

GoogleCodeExporter commented 8 years ago
Note that the patch fixes the first problem only. The support for ordered list 
is
still not fixed.

In addition there's a 3rd issue: I've executed the test again using wikimodel 
from
trunk and I now get the following events:

beginDocument
beginList: [NUMBERED]
beginListItem
onSpace
onWord: [ ]
onWord: [1]
endListItem
endList: [NUMBERED]
beginList: [NUMBERED]
beginListItem
onSpace
onWord: [ ]
onWord: [2]
endListItem
endList: [NUMBERED]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [3]
endListItem
endList: [BULLETED]
beginList: [NUMBERED]
beginListItem
onSpace
onWord: [ ]
onWord: [4]
endListItem
endList: [NUMBERED]
beginList: [NUMBERED]
beginListItem
onSpace
onWord: [ ]
onWord: [5]
endListItem
endList: [NUMBERED]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [1]
endListItem
endList: [BULLETED]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [2]
endListItem
endList: [BULLETED]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [3]
endListItem
endList: [BULLETED]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [4]
endListItem
endList: [BULLETED]
beginList: [BULLETED]
beginListItem
onSpace
onWord: [ ]
onWord: [5]
endListItem
beginListItem
onSpace
onWord: [ ]
onWord: [6]
endListItem
endList: [BULLETED]
endDocument

As you can see the list items and lists are closed when they shouldn't.

Original comment by mas...@gmail.com on 20 Jul 2008 at 7:26

GoogleCodeExporter commented 8 years ago

Original comment by vmas...@gmail.com on 27 Aug 2008 at 12:40

GoogleCodeExporter commented 8 years ago
Fixed

Original comment by vmas...@gmail.com on 18 Sep 2008 at 11:55