Closed infojunkie closed 2 years ago
@infojunkie commented on 23 Sep 2018, 07:51 BST:
For example: "Autumn Leaves" in "The New Real Book" is marked as page 25, whereas the page and the book's paper index indicate page 12 for this tune: https://github.com/aspiers/book-indices/blob/master/NewReal1.csv#L11
It would seem the pages reported correspond to the page number in the pdf scans which are being shared on the internet.
Yes, that's correct.
But that is not so useful, either for those using the actual paper book, or even for those using the pdf, because most pdf readers honor logical page numbers which mirror the actual book.
Yes, that's a good point. It would be much nicer to work with logical page numbers not the strictly numerically contiguous ones in the PDF. Any suggestions on how to handle this are welcome. Maybe we need a second file which maps between logical and physical page numbers? Then the indices could be changed to use logical ones.
Thanks. How to proceed depends on your goal for this repo, I guess. From my pov, here are the options:
Either of these transformations may be automated with a bash script if there's a constant offset between physical and logical page number (in each respective book of couse).
Well, the offset won't necessarily be constant, and certainly not if you take into account that many books typically have physical pages near the beginning with no corresponding logical page numbers - instead the pages are often marked with letters or Roman numerals. So the mapping would need to describe those too. It shouldn't be hard to design a system to cope with this, however an extra column for logical page numbers/letters is probably the simplest and easiest solution to understand and use.
Unfortunately I am extremely busy with other stuff at the moment, so I can't offer to move this forward, but I will try to review any pull requests submitted. Thanks a lot for the feedback!
I worked on this problem 11-12 years ago but never completed a solution to the point that I could share it. Two columns are needed, one for the number printed on the page, the other for the page number in the PDF file. The offset between the two is not constant and may even vary by the source of the scan. I concluded manual intervention is required and built a tool that shows each page in a PDF and proposes the appropriate printed page number. The only manual step is to confirm or update. If updated the offset is then applied to subsequent pages. Since updates are infrequent the operation is mostly confirms, which goes quickly. To speed things even further the tool included a way to access the pages at random. Since it is highly unlikely that the offset would go both positive and negative within a group of pages sampling at large page intervals would quickly show alignment. One could sample the interval more finely to locate the point of the shift only if there was mis-alignment. Someday I may get back to it again.
For example: "Autumn Leaves" in "The New Real Book" is marked as page 25, whereas the page and the book's paper index indicate page 12 for this tune: https://github.com/aspiers/book-indices/blob/master/NewReal1.csv#L11
It would seem the pages reported correspond to the page number in the pdf scans which are being shared on the internet. But that is not so useful, either for those using the actual paper book, or even for those using the pdf, because most pdf readers honor logical page numbers which mirror the actual book.