Git-Lit / git-lit

Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.
31 stars 8 forks source link

Stray metadata files in zip #21

Open tfmorris opened 8 years ago

tfmorris commented 8 years ago

I'm not sure what's going on here, but the zip for 000000037 also contains the metadata file 000000218_metadata.xml as well as the correct 000000037_metadata.xml

If this error was introduced at the BL, it's something that we'll need to watch out for when processing.

JonathanReeve commented 8 years ago

Crazy. I'll look into it. While I'm at it, I'll get a bunch more samples and add them to this repo.

tfmorris commented 8 years ago

Actually, it's not just the metadata file. I didn't notice before, but the ALTO directory has all the pages for that volume as well. It's basically two entire volumes merged into a single zip file.

We can code for it if it's something that happens regularly, but if there's a 000000218_ zip file that has the right content, the easiest thing would be to just ignore the stray files (which is what I currently do).