RNAcentral / rnacentral-import-pipeline

RNAcentral data import pipeline
Apache License 2.0
2 stars 1 forks source link

Python update & enable poetry #163

Closed afg1 closed 1 year ago

afg1 commented 2 years ago

This PR updates python to the latest 3.11 in the pipeline container, and makes the necessary changes to the dockerfile to use poetry as the build system. This should close #156

Unit tests relevant to the changes pass, and the Dockerfile is able to build a functioning contaner.

One of the largest changes needed is anywhere we were using ratelimiter. It seems that the project is abandoned, and is not compatible with python 3.11. I've rewritten the code that depended on it to use the alternative throttler library, however the nature of it (converts things into async coroutines) means the changes had to be fairly extensive.

I'm not particularly happy with the changes in the pdb code. We've gone from generators to passing around lists, which is not ideal. However, it does pass the unit tests, and doesn't seem to lose too much performance afaict.

The other potentially breaking change is in the poetry pre-commit hook, where I specify the language version as 3.11. This was ok for me since I upgraded to 3.11 in my base virtualenv, but it will need checking by everyone else. I think it should be ok because pre-commit creates a virtualenv for the hook, so it should grab the right version.

Changes in the Dockerfile aren't too strenuous. I've configured poetry to behave as the old pip installation did, and install dependencies into the 'system' python, which makes them available in the container with no extra steps. I think that should mean everything will Just Work with the new configuration.

blakesweeney commented 2 years ago

Basically passes a quick smell test.