anujkalal / wikixmlj

Automatically exported from code.google.com/p/wikixmlj
0 stars 0 forks source link

parsing page ID is truncated if the value crosses the buffer's 2048 character boundary. #16

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

In class SAXPageCallbackHandler.characters()

The logic that is commented with the following comment ...

// TODO: To avoid looking at the revision ID, only the first ID is taken.
// I'm not sure how big the block size is in each call to characters(),
// so this may be unsafe.

... is indeed unsafe across buffer boundaries.
If the page id is split across two buffers then it is truncated.
So 450123 with  the "450" starting at position 2045 of the ch[] and the "123" 
starting at position 0 of ch[] of the next call will yield a result of "450".

Original issue reported on code.google.com by tjs...@gmail.com on 18 Nov 2012 at 1:55