c-w / gutenberg

A simple interface to the Project Gutenberg corpus.
Apache License 2.0
320 stars 60 forks source link

Use bsddb3 for Python 2.7 #104

Closed MasterOdin closed 4 years ago

MasterOdin commented 6 years ago

While Python 2.7 does come with a builtin bsddb module, it was depreciated in 2.6. It seems to me that we should just migrate wholly to using bsddb3 for all supported versions and which would also simplify the setup_requires for the project and also allow us to add a Pipfile.

Thoughts @c-w, @hugovk?

c-w commented 6 years ago

I'd agree that simplification is a good thing. However, installing BSD-DB is a source of errors for some users (e.g. see #101) so it's nice to have the very simple Python 2.7 setup which just works out of the box.

Maybe a good compromise could be to drop support for the built-in BSD-DB module and at the same time also add a Dockerfile that does all the required setup? In this way we keep a very easy to use version (Docker) while also simplifying the code (single BSD-DB library). Thoughts?

hugovk commented 6 years ago

I've also had problems setting up with BSD-DB on another machine (not this one). This newer machine one was fine to set up.

A Dockerfile might make it easier for some, but isn't necessarily easy for beginners.

How about deciding when to ditch 2.7, and keeping 2.7 as it is until then (but don't let that prevent adding a Dockerfile)?

MasterOdin commented 6 years ago

I'll look into making the Dockerfile as it shouldn't be too hard. I just tested python 2.7 via pyenv on my Mac and that's now broken (missing _bsddb module). So something to be aware of that it's not out of the box across all well-used paths. I do agree that it'd be nice if the whole thing was easier to do. It also doesn't help that bsddb3 doesn't install if you don't have berkeley-db (or have the path set in a way it can discover) which makes things even messier. I'd almost argue that we would want that by default out of the box it only uses SqliteCache (with the warning to use BSD). We then add a extras_require block that will also throw in installing bsddb3 (which also assumes you've installed Berkeley-DB on your OS and include some brief notes on this in the README). That seems like the most error-proof method of things, and I'm not sure it'd really make things too much more complicated for the casual user, as all the instructions for Python 2.7 and 3 would be nicely unified (as looking at the readme, it seems to indicate you only need to install BerkeleyDB if you're using Python3+).

I'm also in favor of dropping Python 2.7 immediately so that we can just use the nice Python3 libraries and not have to mess around with future and six, and have done this for all personal projects I actively maintain.

hugovk commented 6 years ago

By the way, here's the pip installs for gutenberg from PyPI for April 2018:

python_version percent download_count
3.6 40.91% 36
2.7 34.09% 30
3.5 23.86% 21
3.4 1.14% 1
Total 88

Source: pypinfo --start-date 2018-04-01 --end-date 2018-04-30 --percent --markdown gutenberg pyversion

Good to see two thirds are already Python 3.

hugovk commented 4 years ago

Is it time to drop Python 2?

c-w commented 4 years ago

Given that there isn't a ton of work happening on this repo, I would suggest to keep the current setup. As such, closing this issue. Feel free to reopen if you feel strongly otherwise.