c-w / gutenberg

A simple interface to the Project Gutenberg corpus.
Apache License 2.0
322 stars 59 forks source link

Make it explicit what metadata is available #29

Closed hugovk closed 8 years ago

hugovk commented 8 years ago

I know someone who was a bit surprised there was only title and author metadata available after leaving it to run overnight building the database.

So perhaps it should be made a bit clearer that there's only title and author available. Perhaps something like this?

c-w commented 8 years ago

Hi Hugo. Thanks for the PR.

Currently the library only supports querying for title and author information because that's all I needed for my use-case when I wrote the original version of the library.

However, it's quite simple to add support for new meta-data queries. For example, the implementation of the title meta-data extractor is two lines of active code.

Could you please let me know what other sorts of meta-data you'd like to query in a new issue?

hugovk commented 8 years ago

I'm not sure what exactly the person wanted beyond author and title, but they ended up using this dump of metadata which contains this sort of thing. I'm not entirely sure where all that data comes from.

{
    "gutenberg_id" : 1,
    "medium" : "Book",
    "language" : "en",
    "title" : "The Declaration of Independence of the United States of America",
    "sort_title" : "Declaration of Independence of the United States of America, The",
    "audience" : "Adult",
    "subjects" : [{
            "audience" : "Adult",
            "identifier" : "Politics",
            "type" : "gutenberg:bookshelf"
        }, {
            "audience" : "Adult",
            "identifier" : "American Revolutionary War",
            "type" : "gutenberg:bookshelf"
        }, {
            "audience" : "Adult",
            "identifier" : "United States Law",
            "type" : "gutenberg:bookshelf"
        }, {
            "identifier" : "United States -- History -- Revolution, 1775-1783 -- Sources",
            "type" : "LCSH"
        }, {
            "audience" : "Adult",
            "identifier" : "JK",
            "type" : "LCC",
            "name" : "Political institutions and public administration"
        }, {
            "identifier" : "United States. Declaration of Independence",
            "type" : "LCSH",
            "name" : "United States. Declaration of Independence"
        }, {
            "audience" : "Adult",
            "identifier" : "E201",
            "type" : "LCC"
        }
    ],
    "authors" : [{
            "family_name" : "Jefferson",
            "viaf" : "41866059",
            "display_name" : "Thomas Jefferson",
            "lc" : "n79089957",
            "sort_name" : "Jefferson, Thomas",
            "wikipedia_name" : "Thomas_Jefferson"
        }
    ],
    "quality" : 0.485,
    "work_id" : "b2899503-29e2-1e02-a0e3-1a6de23dafe3"
}

http://www.crummy.com/software/gutenberg/47000_metadata.json.gz

Thanks for the pointers. I'll create new a issue or PR if there's anything specific I'd like.