NLeSC / python-template

Netherlands eScience Center Python Template
https://research-software-directory.org/software/nlesc-python-template
Apache License 2.0
163 stars 73 forks source link

Remove Python 3.7: end-of-life #343

Closed egpbos closed 8 months ago

egpbos commented 1 year ago

Python 3.7 will stop being supported in July. We should remove support for it. PRs welcome!

LourensVeen commented 1 year ago

CentOS 7 is supported until July 2024. It has Python 3.6...

egpbos commented 1 year ago

Hm ok... sure, we could also consider running on EoL versions like 3.6. I'm not so sure if this is good practice though. Why would we want to support running on unsupported versions, risking security issues? Maybe deploying to or running on CentOS 7's default Python is just not good practice either for security-sensitive software. Is it not possible to install a newer Python version with conda? It can become a pain to maintain everything manually when you have to build all kinds of stuff with a common toolchain, sure, but such is life on ancient university clusters :P

By the way, while we're at it, I had another thought too: to save CI usage, we could also run the template tests only on the outer limits of our supported range. I'm not sure how much running 3.8, 3.9 and 3.10 really add to just running 3.7 and 3.11. Obviously, there's always going to be exotic edge cases, but this will also go for certain patch versions, which we also don't all run... And if it's an actual detected issue in someone's package, they can always add a version to the matrix.

We could do the same for the generated package's matrix, or maybe even only put the latest Python version in there as a default. Again, people can always add versions they support, but maybe just one version by default is fine for most cases.

bouweandela commented 1 year ago

We could also consider adopting NEP-29 as a best practice:

This NEP recommends that all projects across the Scientific Python ecosystem adopt a common “time window-based” policy for support of Python and NumPy versions.

I see this used a lot in projects. One of the advantages is that you can actually start using newer Python features such as type hints.

LourensVeen commented 1 year ago

I've run CI on patch versions of 3.5 and 3.6 before for YAtiML, because there were some really big changes in typing in the early days of those and YAtiML had all sorts of work-arounds for the differences. But for the template that's not so relevant.

MUSCLE3 will be dropping 3.6 in the next release, which will come out soon, not so much because there's no upstream support anymore, but because it's rapidly becoming impossible to install anything on 3.6. Many packages have dropped support but do not contain correct metadata, so pip install anything is quite likely to drag in a new version of a dependency that doesn't actually run on 3.6, breaking everything. I am still supporting 3.7 for now.

And yes, if you install MUSCLE3 0.6.0 (the current release) on Python 3.6, then you get an older version of NumPy that has a security issue. GitHub keeps nagging me about it, but it's the best we can do. If you upgrade your Python, you get the newer NumPy and all is well.

I mentioned CentOS 7 because I got a support request from someone last week who's running on a CentOS 7 server. The problem there was that there was an ancient version of GoogleTest laying around which tripped up the MessagePack build system, so not a Python issue. He also said that he could easily use a newer Python, although that doesn't seem to be supported by CentOS, you have to use Conda or build your own. (It does have a mechanism for installing newer GCC versions.)

In the end, what makes sense to support depends on where your software is being run I think. If it's mainly run on Windows or macOS desktops, where Python is usually installed by the user as a separate package and frequently updated, then the 42 month window of NEP-29 sounds quite reasonable. That means anything released now would require Python 3.9 or later.

However, if you have users on Ubuntu, NEP-29 leaves anyone still running Ubuntu 20.04 LTS unsupported, even though their OS has standard support until 2025 and extended support until 2030. If your software is run on HPC or compute servers, then it's quite likely that your users will want to use it on the in-house cluster they managed to get some money for about a decade ago, but don't have the time, money or the expertise to do OS upgrades on. So in that case you need a (much) longer window.

Our template of course is just a template. If the user wants to support older versions in their project, they can add them easily enough. So the question here is what a reasonable default is. If it's reasonable to assume that most users of the template make software for the category of people for whom NEP-29 works, then it makes sense to have that as the default for the template. I have no idea if that's the case, but it seems likely?

egpbos commented 1 year ago

Our template of course is just a template. If the user wants to support older versions in their project, they can add them easily enough. So the question here is what a reasonable default is. If it's reasonable to assume that most users of the template make software for the category of people for whom NEP-29 works, then it makes sense to have that as the default for the template. I have no idea if that's the case, but it seems likely?

Yeah, I think it's likely. And even if it isn't, it's indeed easy enough to change things. Aiming to support all our broad use cases means the template will get bloated and that everybody will first have to remove a bunch of stuff to make the template work for them. If the choice then is between "minimal template + add stuff" vs "maximal template + remove stuff", I would go for minimal template to ease the maintenance burden.

NEP-29 sounds good to me too, at least for Python versions; I wouldn't bother with NumPy versions (not all projects need NumPy). Do they keep track of the timelines?

bouweandela commented 1 year ago

Do they keep track of the timelines?

Yes, here: https://numpy.org/neps/nep-0029-deprecation_policy.html#drop-schedule

However, if you have users on Ubuntu, NEP-29 leaves anyone still running Ubuntu 20.04 LTS unsupported, even though their OS has standard support until 2025 and extended support until 2030. If your software is run on HPC or compute servers, then it's quite likely that your users will want to use it on the in-house cluster they managed to get some money for about a decade ago, but don't have the time, money or the expertise to do OS upgrades on.

For Linux personal computer users, there is indeed conda for separate Python installation, which is a really good idea anyway because it avoids all kinds of trouble with compiled dependencies coming from pip installs. For HPC solutions like conda, spack, and containers are used quite often. There are some limitations to how old your kernel is for containers and how old your libc is for conda, but that's about all the trouble I've seen with running newer versions of Python on old operating systems the past few years.

LourensVeen commented 1 year ago

Do note the recent Teams conversation in which it was mentioned that SURF recommends against using Conda on HPC. You'll get generic binaries that may not be able to use all the features of the hardware, and it creates a ton of tiny files which is murder on network file systems.

It's better to module load newer software, but of course that only works if those modules are actually available. They are on big, well-maintained machines, but not necessarily on the random things people have in their closet.

egpbos commented 1 year ago

Definitely true. Those systems will also typically have modules for newer Python versions.

bouweandela commented 1 year ago

You'll get generic binaries that may not be able to use all the features of the hardware

If this is a concern, spack is the solution I usually see used, but it is most useful if you're going to run lots of jobs because compiling everything may also take a lot of time.

it creates a ton of tiny files which is murder on network file systems.

If this is a concern, containers are a good solution because they can be created as a single file

It's better to module load newer software, but of course that only works if those modules are actually available.

On the HPC systems I've seen, the modules are usually created using either spack or conda