aspiers / book-indices

Indices for music books
29 stars 36 forks source link

Add NewReal1-Bb index #42

Closed strk closed 2 months ago

strk commented 2 months ago

It took me the whole morning to extract from:

99f2d4fb6e25fca77a753e86430098a1 TheNewRealBookVol1-Bb.pdf (md5sum)

joaodriessen commented 2 months ago

Thank you for this! You mentioned it took all morning. I would have expected it to take much longer. Could you explain how you did it? any tricks to get it done in a timely fashion? I could have a pop at one of the other unindexed books.

strk commented 2 months ago

I'm pretty fast with keyboard and "vi" editor :) How I did:

  1. Started from the C version index (which seems good)
  2. Run an extractor (the nodejs one)
  3. Iteratively checked which piece was the first non-matching title and "offsetted" all the lines from there onward to the desired page number

To help with the iterations I wrote a short perl script "offset.pl" that took a single argument (a signed offset) and would change both starting and ending pages by summing up that offset.

Common reasons for page mismatches:

  1. Pages with photos of musicians were not in my PDF
  2. Some pieces were single page instead of multipages

Please include the md5sum of your source PDF when you contribute the updated index.

aspiers commented 2 months ago

Awesome! Great idea about checksums too. We should incorporate checksums into the repo, allowing for the fact that the same real book could have multiple PDFs.

strk commented 2 months ago

And maybe number of pages too, as changes in resolution or reflow would change checksum while keeping number of pages unchanged

On July 10, 2024 9:49:55 PM GMT+02:00, Adam Spiers @.***> wrote:

Awesome! Great idea about checksums too. We should incorporate checksums into the repo, allowing for the fact that the same real book could have multiple PDFs.

-- Sent from hand-held device with K-9 Mail. Please excuse my brevity.

aspiers commented 2 months ago

I'm travelling right now, any chance you could file an issue for those ideas?

strk commented 2 months ago

I'm travelling right now, any chance you could file an issue for those ideas?

https://github.com/aspiers/book-indices/issues/46