Open mzarozinski opened 8 years ago
This appears to be an issue with the transformation from rawtei to toktei. In the DjVu and rawtei "page" (really image offset) 56 contains "o X". The actual page (https://archive.org/stream/cu31924020438929#page/n56/mode/1up) is an illustration, the next page is blank.
The Phokas program pulled the contents of that "page" into the previous page resulting in offset 56 being removed, followed by (correctly) offset 57 being blank.
The page index was built using toktei (list of document for that index is on sydney at /mnt/nfs/work3/michaelz/data/caribbean-via-grep.list). Proteus expects to see "o X" for offset 56 (the illustration) but that does not exist, resulting in the off by one error.
Attached are the rawtei and toktei files. Search for "
Ultimately the solution is to either fix Phokas or build the index using rawtei files. My experience has been that building from the rawtei files is the best way to proceed.
See document cu31924020438929
The actual book has: page 32, a full page image, blank page, page 33. In Proteus book page number 33 is associated with the text for page 34.
https://archive.org/stream/cu31924020438929#page/n55/mode/2up http://laguna.cs.umass.edu:2333/view.html?kind=ia-pages&action=view&id=cu31924020438929_55