allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

Failed in parsing page without chars #258

Open juncaofish opened 1 year ago

juncaofish commented 1 year ago

If some page has no chars(like some books cover), the script would throw exception in parsing and the following process, due to array index out of bound, as the doc.pages will not include page with empty chars.