Closed MasterOdin closed 4 years ago
I'd agree that simplification is a good thing. However, installing BSD-DB is a source of errors for some users (e.g. see #101) so it's nice to have the very simple Python 2.7 setup which just works out of the box.
Maybe a good compromise could be to drop support for the built-in BSD-DB module and at the same time also add a Dockerfile that does all the required setup? In this way we keep a very easy to use version (Docker) while also simplifying the code (single BSD-DB library). Thoughts?
I've also had problems setting up with BSD-DB on another machine (not this one). This newer machine one was fine to set up.
A Dockerfile might make it easier for some, but isn't necessarily easy for beginners.
It's not that long until Python 2.7 is EOL: https://pythonclock.org.
Some projects have committed to dropping 2.7 on or before 2020-01-01: http://python3statement.org.
How about deciding when to ditch 2.7, and keeping 2.7 as it is until then (but don't let that prevent adding a Dockerfile)?
I'll look into making the Dockerfile as it shouldn't be too hard. I just tested python 2.7 via pyenv on my Mac and that's now broken (missing _bsddb module). So something to be aware of that it's not out of the box across all well-used paths. I do agree that it'd be nice if the whole thing was easier to do. It also doesn't help that bsddb3 doesn't install if you don't have berkeley-db (or have the path set in a way it can discover) which makes things even messier. I'd almost argue that we would want that by default out of the box it only uses SqliteCache (with the warning to use BSD). We then add a extras_require block that will also throw in installing bsddb3 (which also assumes you've installed Berkeley-DB on your OS and include some brief notes on this in the README). That seems like the most error-proof method of things, and I'm not sure it'd really make things too much more complicated for the casual user, as all the instructions for Python 2.7 and 3 would be nicely unified (as looking at the readme, it seems to indicate you only need to install BerkeleyDB if you're using Python3+).
I'm also in favor of dropping Python 2.7 immediately so that we can just use the nice Python3 libraries and not have to mess around with future and six, and have done this for all personal projects I actively maintain.
By the way, here's the pip installs for gutenberg from PyPI for April 2018:
python_version | percent | download_count |
---|---|---|
3.6 | 40.91% | 36 |
2.7 | 34.09% | 30 |
3.5 | 23.86% | 21 |
3.4 | 1.14% | 1 |
Total | 88 |
Source: pypinfo --start-date 2018-04-01 --end-date 2018-04-30 --percent --markdown gutenberg pyversion
Good to see two thirds are already Python 3.
Is it time to drop Python 2?
Given that there isn't a ton of work happening on this repo, I would suggest to keep the current setup. As such, closing this issue. Feel free to reopen if you feel strongly otherwise.
While Python 2.7 does come with a builtin bsddb module, it was depreciated in 2.6. It seems to me that we should just migrate wholly to using bsddb3 for all supported versions and which would also simplify the setup_requires for the project and also allow us to add a Pipfile.
Thoughts @c-w, @hugovk?