aspiers / book-indices

Indices for music books
29 stars 36 forks source link

Page numbers are based on pdf versions? #19

Closed infojunkie closed 2 years ago

infojunkie commented 6 years ago

For example: "Autumn Leaves" in "The New Real Book" is marked as page 25, whereas the page and the book's paper index indicate page 12 for this tune: https://github.com/aspiers/book-indices/blob/master/NewReal1.csv#L11

It would seem the pages reported correspond to the page number in the pdf scans which are being shared on the internet. But that is not so useful, either for those using the actual paper book, or even for those using the pdf, because most pdf readers honor logical page numbers which mirror the actual book.

infojunkie commented 6 years ago

index

aspiers commented 6 years ago

@infojunkie commented on 23 Sep 2018, 07:51 BST:

For example: "Autumn Leaves" in "The New Real Book" is marked as page 25, whereas the page and the book's paper index indicate page 12 for this tune: https://github.com/aspiers/book-indices/blob/master/NewReal1.csv#L11

It would seem the pages reported correspond to the page number in the pdf scans which are being shared on the internet.

Yes, that's correct.

But that is not so useful, either for those using the actual paper book, or even for those using the pdf, because most pdf readers honor logical page numbers which mirror the actual book.

Yes, that's a good point. It would be much nicer to work with logical page numbers not the strictly numerically contiguous ones in the PDF. Any suggestions on how to handle this are welcome. Maybe we need a second file which maps between logical and physical page numbers? Then the indices could be changed to use logical ones.

infojunkie commented 6 years ago

Thanks. How to proceed depends on your goal for this repo, I guess. From my pov, here are the options:

Either of these transformations may be automated with a bash script if there's a constant offset between physical and logical page number (in each respective book of couse).

aspiers commented 6 years ago

Well, the offset won't necessarily be constant, and certainly not if you take into account that many books typically have physical pages near the beginning with no corresponding logical page numbers - instead the pages are often marked with letters or Roman numerals. So the mapping would need to describe those too. It shouldn't be hard to design a system to cope with this, however an extra column for logical page numbers/letters is probably the simplest and easiest solution to understand and use.

Unfortunately I am extremely busy with other stuff at the moment, so I can't offer to move this forward, but I will try to review any pull requests submitted. Thanks a lot for the feedback!

wrwetzel commented 2 years ago

I worked on this problem 11-12 years ago but never completed a solution to the point that I could share it. Two columns are needed, one for the number printed on the page, the other for the page number in the PDF file. The offset between the two is not constant and may even vary by the source of the scan. I concluded manual intervention is required and built a tool that shows each page in a PDF and proposes the appropriate printed page number. The only manual step is to confirm or update. If updated the offset is then applied to subsequent pages. Since updates are infrequent the operation is mostly confirms, which goes quickly. To speed things even further the tool included a way to access the pages at random. Since it is highly unlikely that the offset would go both positive and negative within a group of pages sampling at large page intervals would quickly show alignment. One could sample the interval more finely to locate the point of the shift only if there was mis-alignment. Someday I may get back to it again.