c-w / gutenberg

A simple interface to the Project Gutenberg corpus.
Apache License 2.0
320 stars 60 forks source link

Avoid downloading the cache? #115

Closed fickas closed 5 years ago

fickas commented 5 years ago

I'd like to download 2701 text and the meta-data that goes with it. Can I get the 2701 meta-data without downloading the entire cache?

Thanks.

hugovk commented 5 years ago

Here's a repo containing metadata generated using this library. Does this help?

https://github.com/hugovk/gutenberg-metadata

c-w commented 5 years ago

In addition to the gutenberg-metadata repo, there's also gutenberg-http which you can set up as a web-service to fetch texts and metadata on-demand. Still requires the cache to be created on the server though.

As for the gutenberg library itself, there currently is no option to access metadata without downloading the cache first. Given that the metadata information is distributed by Project Gutenberg as one large tarball, I doubt that this is something we can optimize.

Resolving. Feel free to re-open if you have any further questions.

c-w commented 5 years ago

@fickas I've re-instated a hosted version of gutenberg-http which makes it easier to pull down the metadata without having to install the gutenberg library. For example, to fetch the metadata for etext 2701, you can execute this:

curl https://gutenberg.justamouse.com/texts/2701

There's also a demo-page available to explore the API.

Note though that for now the service is hosted on a single VM so I'm not making any promises about performance or uptime. If you wish to use this in production, I'd strongly spinning up your own deployment of gutenberg-http.