c-w / gutenberg

A simple interface to the Project Gutenberg corpus.
Apache License 2.0
320 stars 60 forks source link

Change mirrors and make it easier for end users to switch #82

Closed MasterOdin closed 7 years ago

MasterOdin commented 7 years ago

Closes #81

General usage for allowing the end user to specify a mirror to use:

$ python3 -m gutenberg.acquire.text 2701 moby-raw.txt
INFO:rdflib:RDFLib Version: 4.2.2
http://aleph.gutenberg.org/2/7/0/2701/2701.txt

$ python3 -m gutenberg.acquire.text --mirror http://mirrors.xmission.com/gutenberg/ 2701 moby-raw.txt
INFO:rdflib:RDFLib Version: 4.2.2
http://mirrors.xmission.com/gutenberg/2/7/0/2701/2701.txt

$ GUTENBERG_MIRROR=http://mirrors.xmission.com/gutenberg/ python3 -m gutenberg.acquire.text 2701 moby-raw.txt
INFO:rdflib:RDFLib Version: 4.2.2
http://mirrors.xmission.com/gutenberg/2/7/0/2701/2701.txt

$ GUTENBERG_MIRROR=http://mirrors.xmission.com/gutenberg/ python3 -m gutenberg.acquire.text --mirror http://eremita.di.uminho.pt/gutenberg/ 2701 moby-raw.txt
INFO:rdflib:RDFLib Version: 4.2.2
http://eremita.di.uminho.pt/gutenberg/2/7/0/2701/2701.txt

(the printed URI is just for the example, won't happen in real usage)

I've also added an additional exception in using an invalid mirror (ie, URI doesn't exist, forbidden, etc):

python3 -m gutenberg.acquire.text --mirror http://www.gutenberg.lib.md.us 2701 moby-raw.txtINFO:rdflib:RDFLib Version: 4.2.2
usage: text.py [-h] [--mirror MIRROR] etextno outfile
text.py: error: Could not reach Gutenberg mirror 'http://www.gutenberg.lib.md.us'. Try setting a different mirror (https://www.gutenberg.org/MIRRORS.ALL) for --mirror flag or GUTENBERG_MIRROR environment variable.

And if you use a good mirror, but the etext does not exist (this will return an empty message in current version):

python3 -m gutenberg.acquire.text 270111111111 moby-raw.txt                                                                                                  [10:14:10]
INFO:rdflib:RDFLib Version: 4.2.2
usage: text.py [-h] [--mirror MIRROR] etextno outfile
text.py: error: Failed to find 270111111111 on http://aleph.gutenberg.org.
coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.6%) to 95.852% when pulling ea6369a82130eadce79082a8c6e15517978b5fa0 on issue_81 into 4fd78daf0c0872bafe45002b3db0a009ac5084f7 on master.

MasterOdin commented 7 years ago

Not sure the best way to add tests for this, open to suggestions.

hugovk commented 7 years ago

You could try fetching something by literally using example.com as a mirror.

MasterOdin commented 7 years ago

That would test the case where the mirror exists, but text itself couldn't found, but I couldn't seem to get a good test of "the mirror doesn't exist, it fails", but I guess that might just be something screwy with my network settings affecting python.

hugovk commented 7 years ago

Perhaps something like thismirrordoesntexist.com?

MasterOdin commented 7 years ago

Weirdly:

>>> import requests
>>> response = requests.head("http://thismirrordoesntexist.com")
>>> response.ok
True
>>> response.status_code
301

I'm going to have to figure out what's going on with my network stack, so I cannot write tests at this time.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.3%) to 96.767% when pulling 8304af2f526fa5ae1b65c9fba38e5c14993ce795 on issue_81 into 4fd78daf0c0872bafe45002b3db0a009ac5084f7 on master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.03%) to 96.391% when pulling a5ff33b60223435d2867ae1c010576c5addd7b81 on issue_81 into 4fd78daf0c0872bafe45002b3db0a009ac5084f7 on master.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.3%) to 96.767% when pulling fce957aeaa6f02592bb840499f0f6d0fd1ca33f0 on issue_81 into 4fd78daf0c0872bafe45002b3db0a009ac5084f7 on master.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.3%) to 96.767% when pulling fce957aeaa6f02592bb840499f0f6d0fd1ca33f0 on issue_81 into 4fd78daf0c0872bafe45002b3db0a009ac5084f7 on master.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.3%) to 96.767% when pulling fce957aeaa6f02592bb840499f0f6d0fd1ca33f0 on issue_81 into 4fd78daf0c0872bafe45002b3db0a009ac5084f7 on master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 8aa9a261e6168a66a21c7498797a5eeaef9e9bfa on issue_81 into on master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 8aa9a261e6168a66a21c7498797a5eeaef9e9bfa on issue_81 into on master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 8aa9a261e6168a66a21c7498797a5eeaef9e9bfa on issue_81 into on master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 8aa9a261e6168a66a21c7498797a5eeaef9e9bfa on issue_81 into on master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 8aa9a261e6168a66a21c7498797a5eeaef9e9bfa on issue_81 into on master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 8aa9a261e6168a66a21c7498797a5eeaef9e9bfa on issue_81 into on master.

hugovk commented 7 years ago

I think this @coveralls comment spam could be turned off, especially when it's so useless ("changes unknown"?) but keep the status API:

image

https://coveralls.io/github/c-w/Gutenberg/settings

c-w commented 7 years ago

@hugovk Sounds great, please do turn off the spam :)

hugovk commented 7 years ago

@c-w I don't have the rights for this repo, please can you do it?

c-w commented 7 years ago

@hugovk Should be done now.