aparrish / gutenberg-dammit

I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this
211 stars 14 forks source link

update metadata? #5

Open aparrish opened 6 years ago

aparrish commented 6 years ago

In some cases, it looks like the metadata from the GutenTag dump (itself based on the DVD ISO) is out-of-date with the live Project Gutenberg site. For example, Coleridge's Complete Poetical Works has a subject tag on the live site, but that subject tag is missing in the GutenTag HTML metadata (and thus from the metadata in the Gutenberg, dammit archive). Fixing this might depend on a fix for #3, but could also possibly be fixed by just using the most up-to-date RDFs from the catalog data?