SamEdwardes / samedwardes.com

My personal blog at samedwardes.com
https://samedwardes.com
MIT License
11 stars 2 forks source link

Blog post: poetry #20

Closed SamEdwardes closed 1 month ago

SamEdwardes commented 1 year ago

Poetry

!!! warning

DRAFT

Poetry is a tool for Python packaging and dependency management. Check out the poetry docs here: https://python-poetry.org{.uri}. As a data scientist you can use poetry to create reproducible python environments for you and your team. The key features are:

Usage

Installation {#installation}

To install poetry run the following command:

Terminal

$ curl -sSL https://install.python-poetry.org | python3 -

Follow the instructions from the terminal output to configure poetry. For example, if you are using bash you will need to add the following line to your \~/.bashrc file:

export PATH="$HOME/.poetry/bin:$PATH"

Restart your shell, and verify that poetry is working by checking the version:

Terminal

$ poetry --version

Create a new project {#create-a-new-project}

Create a new empty directory for the project.

Terminal

$ mkdir ~/my_app
$ cd ~/my_app 

Use the poetry init command to setup poetry:

Terminal

$ poetry init \
    --no-interaction \
    --name my_app \
    --author "YourName <yourname@gmail.com>" \
    --description "A hello world poetry example"

The poetry init commands creates a pyproject.toml file. After running the above command your project structure will have only one file and look like this:

.
└── pyproject.toml
[tool.poetry]
name = "my_app"
version = "0.1.0"
description = "A hello world poetry example"
authors = ["YourName <yourname@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.9"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

pyproject.toml is a special file that poetry uses to store project configuration data. It is not specific to poetry, other tools can also store information in pyproject.toml (read PEP 621 to learn more). The tool.poetry section of the pyproject.toml file is where the poetry specific meta-data is stored (https://python-poetry.org/docs/pyproject/). As you will learn in the upcoming sections pyproject.toml will automatically be updated by poetry as we add and remove dependencies.

Manage dependencies

Poetry comes with a suite of commands that you can use to manage your dependencies without ever touching pyproject.toml by hand. The main commands include:

Add a dependency

To add a dependency you can use the poetry add command. For example we can add requests to our project.

Terminal

$ poetry add requests

Running poetry add <PACKAGE_NAME> will achieve several things:

!!! tip You can think of poetry add <PACKAGE_NAME> as being equivalent to pip install <PACKAGE_NAME>. One of the benefits of using poetry add <PACKAGE_NAME> is that the requirement will be documented in our pyproject.toml, where as with pip the requirement is not documented in any configuration file.

We also want to use a code formatter to ensure that our code looks good and that the development team does not waste time arguing about tabs vs spaces. So lets install black!

Terminal

$ poetry add --dev black

The output will look very similar, but note the use of the --dev flag. This tells poetry that this is a "development only" dependency. This means that the app does not need black to work, but we want all of the developers who are working on the app to have black installed so that code is formatted consistently.

We have now installed two packages: requests and black. Lets take a look and see how poetry has updated our configuration files. You will see that our project now has two files:

.
├── poetry.lock
└── pyproject.toml

pyproject.toml

pyproject.toml has been updated:

[tool.poetry]
name = "my_app"
version = "0.1.0"
description = "A hello world poetry example"
authors = ["YourName <yourname@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.27.1"

[tool.poetry.dev-dependencies]
black = "^22.3.0"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

poetry.lock

poetry.lock was created after we installed requests. The lock file includes the dependencies we have declared via the CLI, plus all of the dependencies of those packages. The file can be large, so click the button below to expand the file.

{! docs/environments/python/python-dependency-management/assets/poetry.lock.example[ln:1-12] !}
...
👀 Full file preview ```toml title="poetry.lock" linenums="1" {! docs/environments/python/python-dependency-management/assets/poetry.lock.example !} ```

Remove a dependency

After a few weeks of development the team has decided that they do not want to use the black code formatter anymore. Instead, everyone has agreed on autopep8.

First we need to remove black:

Terminal

$ poetry remove --dev black

This command will remove black, and it will also remove all of black's dependencies that we no longer need. Then, lets add autopep8 as a dependency:

Terminal

$ poetry add --dev autopep8

pyproject.toml has been updated. black is no longer a dev dependency and autopep8 is.

[tool.poetry]
name = "my_app"
version = "0.1.0"
description = "A hello world poetry example"
authors = ["YourName <yourname@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.27.1"

[tool.poetry.dev-dependencies]
autopep8 = "^1.6.0"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Run your code

Under the hood poetry uses virtual environments to isolate your projects dependencies. Every time we call poetry add or poetry remove we are modifying that virtual environment. In order to run a command inside the virtual environment we use the command poetry run.

If you invoke python as you normally would it uses the default python interpreter for your system.

Terminal

$ which python
/Users/samedwardes/.pyenv/shims/python

In order to use the virtual environment created by poetry you need to prefix your commands with poetry run.

Terminal

$ poetry run which python
/Users/samedwardes/Library/Caches/pypoetry/virtualenvs/my-app-SAojqYOg-py3.9/bin/python

Note the difference above. When we prefix our command with poetry run we run our command inside the virtual environment. When we do not prefix the command with poetry run the virtual environment is not used.

!!! tip If you want to avoid prefixing all of your commands with poetry run you can use the poetry shell command. Check out the poetry docs for more details: https://python-poetry.org/docs/cli/#shell.

Collaboration

So far you have created a new project, and used poetry to document your dependencies. Your app is getting a lot of traction and you want to implement some new features. To help with the backlog you will need to on-board a new colleague.

How can you ensure that you and your colleague are using identical environments?

There are two key things your colleague will need:

With these two files anyone will be able to reproduce your environment.

!!! tip Both poetry.lock and pyrpoject.toml should be checked into version control (e.g. GitHub).

When your colleague is ready to start working on the project here is what they will need to do:

  1. Obtain the source code. Assuming you are using git based version control this would be done with git clone.
  2. Install poetry (see the installation section).
  3. Install the project dependencies using poetry install.

That is it 🎉! Your colleague will now be able to run the code using the poetry run command. They can also make changes to the environment with poetry add and poetry remove!

FAQ

How do I publish to RStudio Connect?

See the poetry section in the Publishing to RStudio Connect page.

How do I specify the source of my packages?

By default poetry is configured to use the PyPI repository (https://pypi.org). However, poetry does support the use of alternate repositories as well. Lets add RStudio Package Manager as an alternate. To do this you need to update pyproject.toml by hand:

[[tool.poetry.source]]
name = "rspm"
url = "https://colorado.rstudio.com/rspm/pypi/latest/simple"

Now when you run poetry add or poetry install poetry will check both Rstudio Package Manager and PyPi.

!!! tip

Any custom repository will have precedence over PyPI. If you still want PyPI to be your primary source for your packages you can declare custom repositories as secondary.

``` toml
[[tool.poetry.source]]
name = "rspm"
url = "https://colorado.rstudio.com/rspm/pypi/latest/simple"
secondary = true
```

If you want to disable *PyPi* so that only *RStudio Package Manager* is used you can use the `default` keyword.

``` toml
[[tool.poetry.source]]
name = "rspm"
url = "https://colorado.rstudio.com/rspm/pypi/latest/simple"
default = true
```

How do I update dependencies?

Overtime you may want to update your dependencies. For example one day pandas version 2.0 may be released and you will want to update to the latest and greatest.

To update pandas only run:

Terminal

$ poetry update pandas

To update all dependencies in your project run:

Terminal

$ poetry update

How do I specify which version of a package I want to use?

You can specify a specific package version in poetry add by using the == operator. For example:

Terminal

$ poetry add urllib3==1.26.0

Read more about constraining package versions here: https://python-poetry.org/docs/cli/#add.

How do I switch the version of Python I want to use?

By default poetry will create a virtual environment using your current python environment. You can change which version of python poetry is using with the poetry env use command.

Here is an example of how I would use Python version 3.10.1:

Terminal

$ poetry env use ~/.pyenv/versions/3.10.1/bin/python

We can validate that this worked by checking the python version:

Terminal

$ poetry run python --version
Python 3.10.1

I can always change my Python version later by running the command again. For example lets downgrade to Python 3.9.10:

Terminal

$ poetry env use ~/.pyenv/versions/3.9.10/bin/python
$ poetry run python --version
Python 3.9.10

What should I check into version control?

There are two files you should check into version control:

How do I create a requirements.txt file?

Use the poetry export command:

Terminal

$ poetry export --without-hashes --output requirements.txt

To be moved...

Poetry

For a full overview of poetry see the poetry section. To get started first initialize a new poetry project.

Project setup

Terminal

$ poetry init \
    --no-interaction \
    --name my_streamlit_app \
    --author "YourName <yourname@gmail.com>" \
    --python ">3.9,<3.10" \
    --description "A hello world streamlit app"

!!! tip Note that we constrained python to ">3.9,<3.10". This is because I know that my RStudio Connect server has a version of Python 3.9 installed.

This will create a file pyproject.toml.

[tool.poetry]
name = "my_streamlit_app"
version = "0.1.0"
description = "A hello world streamlit app"
authors = ["YourName <yourname@gmail.com>"]

[tool.poetry.dependencies]
python = ">3.9,<3.10"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Next lets add streamlit as a dependency.

Terminal

$ poetry add streamlit

Now make sure that your app is working.

Terminal

$ poetry run streamlit run app.py

Publish to connect

In a poetry project the complete list of dependencies and the versions used are documented in poetry.lock file. One of the ways RStudio Connect discovers an apps dependencies is by inspecting a projects requirements.txt. RStudio Connect does not know how to inspect pyproject.toml or poetry.lock.

Luckily for us there is a simple workaround. We can use the poetry export command to document all of the dependencies in a requirements.txt format.

Terminal

$ poetry export --without-hashes --output requirements.txt

Now you can publish to RStudio Connect as you normally would. First lets install rsconnect-python as a development dependency:

Terminal

$ poetry add --dev rsconnect-python

!!! tip You do not need to rerun poetry export --without-hashes --output requirements.txt after adding rsconnect-python. This is because we added it as a development dependency. poetry export only includes non-development dependencies unless we add the --dev flag.

Then, we can publish our content!

Terminal

$ poetry run rsconnect deploy streamlit \
  --python $(poetry run which python) \
  --exclude "poetry.lock" \
  --exclude "pyproject.toml" \
  --entrypoint app \
  --new \
  .
SamEdwardes commented 1 month ago

Duplicate of #21