When indexing the pages of this book with a field such as publication date, if we search only by the date field, we often get "no results found". This is because the top K results have a length of zero and RankedDocumentModel skips them.
Should we even index zero length docs? If so, we'll need to ignore them during scoring.
Note that some zero length docs may be in the middle of a book and contain an image which may be valuable to the user.
Many books have the first few pages completely blank (ex: https://archive.org/stream/terrestrialmagn00survgoog#page/n0/mode/2up)
When indexing the pages of this book with a field such as publication date, if we search only by the date field, we often get "no results found". This is because the top K results have a length of zero and RankedDocumentModel skips them.
Should we even index zero length docs? If so, we'll need to ignore them during scoring.
Note that some zero length docs may be in the middle of a book and contain an image which may be valuable to the user.