Paddle is a fresh, extensible, and IDE-friendly build system for Python. It provides a declarative way for managing project dependencies, configuring execution environment, running tasks, and much more.
venv
/pytest
/pylint
/twine
) — you already know how to use Paddle!venv
), Paddle downloads and installs Python packages to
the internal cache repository, and then creates symbolic links from these files to your local project environments.
This allows Paddle to save a significant amount of hard drive space, especially in the case of a multi-project build
with several environments targeting the same Python package with different versions.To run Paddle, you need:
To be able to load and install various versions of Python interpreters, please, follow the instructions given here for your platform.
Experimental: Paddle CLI is compiled as
a native image using GraalVM and available for Linux and
macOS. You can
still use plain paddle-$version-all.jar
build with Java 8 (or higher).
The preferable way to install Paddle is to download a PyCharm plugin from the JetBrains Marketplace.
The plugin already contains a bootstrapped Paddle build system inside (so you don't even have to install anything else manually) and supports a bunch of features:
requirements.txt
to Paddle YAML
configurations;If you want to use the native binary image of the CLI tool, you can download it with the following simple commands:
curl -s 'https://raw.githubusercontent.com/JetBrains-Research/paddle/master/scripts/install.sh' -o ./install.sh && chmod +x install.sh && ./install.sh && rm ./install.sh
Paddle CLI wrapper will automatically detect your system and download necessary binary.
Since right now native binaries are not supported for all OS types and platforms, you can directly download JVM version of the tool.
curl -s 'https://raw.githubusercontent.com/JetBrains-Research/paddle/master/scripts/install.sh' -o ./install.sh && chmod +x install.sh && ./install.sh jar && rm ./install.sh
Note: it requires JRE to run.
You can verify your installation by running:
./paddle --help
Note: Paddle CLI generally assumes that it is called from the root directory of the current Paddle project.
For a quick start, you can simply create a new project in the PyCharm IDE and
choose File - New - Paddle YAML
from the top menu.
This will generate a template paddle.yaml
build configuration file in the root directory of your
project.
Then, press the Load Paddle project
button on the pop-up in the bottom-right corner of your screen
and wait until Paddle finishes building the project's model and configuring the execution environment.
You can check the build status on the Build
tool window tab.
That's it, you are now ready to go!
In case of a using the CLI, create a new paddle.yaml
file in the root directory of your project and
paste the following script:
project: example
metadata:
version: 0.1.0
plugins:
enabled:
- python
# Prerequisites: https://github.com/pyenv/pyenv/wiki#suggested-build-environment
environment:
path: .venv
python: 3.9
requirements:
dev:
- name: pytest
version: ==7.1.2
- name: pylint
version: ==2.14.4
- name: mypy
version: ==0.961
- name: twine
version: ==4.0.1
- name: wheel
version: ==0.37.1
Then, you can run the following command:
paddle install
It will prepare your environment, find or download the Python interpreter, and install the specified dev requirements.
paddle.yaml
(the
name matters), which must be stored in the project's root directory. A project can have
subprojects that are declared in the paddle.yaml
file and can be referenced later as its
own local dependencies.
paddle.yaml
file) in the root directory of your working environment.clear
or install
). Tasks also can have
dependencies that ensure that some other tasks must be completed before running the current
task (e.g., resolveRepositories <- resolveRequirements <- install <- lock
).
-P
flag,
e.g. -PextraArgs="arg1 arg2"
. Note: additional argument is not part of the task's input,
so updating options will not enforce task to run.plugins
section
of the build paddle.yaml
file.
python
plugin out-of-the-box..jars
. The documentation about the development of custom plugins is coming soon.Build configuration of the Paddle project is specified in the paddle.yaml
file. This file is
semantically split into sections, where some of them are built-in, and some of them are added by the external or
bundled plugins.
If you are using the PyCharm plugin, it will help you with the schema of the paddle.yaml
automatically. Use the Ctrl + Shift + Space
shortcut (by default) to look through the completion
variants when writing the YAML configuration.
All these sections are available in every Paddle project.
project
is a unique name of the given Paddle project. If you are also using
a Python plugin to build Python wheels, this name will be used as a package name.
Note: in Python, packages should be named using underscore_case, while names of the Paddle projects could use
any case in general.
However, if you are planning to build your own Python packages (.whl
-distributions), make sure you are using
underscores for naming packages under the source root of the Paddle project.
project: example
subprojects
is a list of names of the subprojects for the
current project. There are no
restrictions where these subprojects should be placed in relation to each other, but they all
have to be stored somewhere under the root directory of the root Paddle project.
subprojects:
- subproject-one
- subproject-two
- some-other-subproject
main-project/
├──subproject-one/
│ │ ...
│ └──paddle.yaml
│
├──subproject-two/
│ ├──some-other-subproject/
│ │ │ ...
│ │ └──paddle.yaml
│ │ ...
│ └──paddle.yaml
│
└──paddle.yaml
roots
is a key-value map of the "root"-folders of the project.
roots:
sources: src/main
tests: src/test
resources: src/resources
testsResources: test/resources
dist: build
sources
: the path to the directory with all the source files (src/
by default). \
If you have several Python
packages within a single Paddle project, please store all of them under this folder.
Generally speaking, this is not encouraged: the preferred way is "one Python package == one
Paddle project".tests
: the path to the directory with tests (tests/
by default).resources
: the path to the directory with the project's resources (src/resources/
by default).testsResources
: the path to the directory with the project's test resources (tests/resources/
by default).dist
: the path to the directory where the distribution files (e.g., .whl
) are built and stored
(dist/
by default).plugins
is a list of plugins to be available in the current Paddle
project. Use the enabled
subsection to specify bundled/built-in plugins, or jars
to include
paths to your own custom plugins.
plugins:
enabled:
- python
jars:
- plugins/test-plugin-0.1.0.jar
The following sections are added by the python
plugin, so make sure you have enabled it
in your project.
metadata
is a key-value map containing the Python Package metadata.
Paddle will use it when building a wheel distribution.
metadata:
version: 0.1.0
description: Short description of the project.
author: Your Name
authorEmail: your.email@example.com
url: your.homepage.com
keywords: "key word example"
classifiers:
- "Programming Language :: Python :: 3"
- "Topic :: Scientific/Engineering :: Artificial Intelligence"
- "Intended Audience :: Developers"
long-description
will be parsed from the README (or README.md) file from the root directory
of the project.build
task, the fields version
and
author
are required. If not specified, they will be inferred from the parent project (if
it exists), and if the inference fails, then the build will fail with an error as well.environment
is a key-value specification of the Python
virtual environment to be used in the Paddle project.
environment:
path: .venv # the value is the same by default
python: 3.9
path
: a relative path to the directory where the virtual environment will be created.
pip
to install new packages, venv
to create/manage
virtual environments, and pip-autoremove
to remove packages with their dependencies.python
: a version of the Python interpreter to be used.
~/.paddle/interpreters
folder.noIndex
(optional): if True, this ignores the PyPi index, and make resolving only with url
from findLinks
section. The flag is set to False
by default.repositories
is a list of the available PyPI repositories.
repositories:
- name: pypi
url: https://pypi.org
uploadUrl: https://upload.pypi.org/legacy/
default: True
secondary: False
Note: a standard PyPI repository (shown in the example above) is included in the list of repositories for every Paddle project by default, so you don't need to add it manually every time.
name
: a unique name of the PyPI repository used in Paddle. It is used to reference the
particular repository in the build system, e.g., in the authentification paddle.auth.yaml
(see below).url
: a URL of the PyPI repository.uploadUrl
(optional): a URL of the PyPI repository to be used by twine
later for publishing packages
with the publish
Paddle task.default
(optional): if True, this disables the default PyPI repo, and makes this particular
private repository the default fallback source when looking up for a package. The flag is set to False
by default.secondary
(optional): by default, any custom repository from the repositories
section will have
precedence over PyPI. If you still want PyPI to be your primary source for your packages, you
can set this flag for your custom repositories to True
(False
by default).Note: the repository list is configured for the current Paddle project only. If you have a
multi-project Paddle build with nested projects, you should either specify the repositories in
each paddle.yaml
file, or use a topmost all
section to wrap the section with repositories
:
all:
repositories:
...
This way, the list of repositories will be available in every subproject of the current Paddle project.
Paddle provides several ways to specify the authentication way for your PyPI repository:
The preferable way is to create a paddle.auth.yaml
file and place it in the root directory
of your Paddle project. Please note that if you have a multi-project build, you need
to create only a single instance of this file and place it in the topmost root project
directory!
If you are using a PyCharm plugin, you can create such file by choosing File - New - Paddle Auth YAML
.
The schema of the paddle.auth.yaml
is the following:
repositories:
- name: private-repo-name
type: netrc | keyring | profile | none
username: your-username
repositories
: a list of PyPI repository references with supplemented authentication ways.
name
: a name of the PyPI repository as specified in the paddle.yaml
configuration.type
: a type of the authentication provider to be used. Could be one of four different
values:
netrc
: use credentials from your
local .netrc
file.keyring
: use credentials from the available keyring
backend.profile
: use credentials from the profiles.yaml
file. The idea of Paddle profiles
is similar (in a certain sense) to the idea
of AWS CLI profiles: you can
have a single file on
your local machine where you specify credentials for your different profiles, and then you
can simply reference it in the build files. This file should be stored in the root of the
~/.paddle/
directory (also referenced as $PADDLE_HOME
). The expected YAML file
structure
is the
following:
profiles:
- name: <your-username-1>
token: <your-private-token-1>
- name: <your-username-2>
token: <your-private-token-2>
none
: do not use authentication for this repository at all.username
: a username to look for in the chosen authentication provider (required only
for netrc
, keyring
, and profiles
).Note: If there are several authentication providers specified for a single repository, Paddle will use the first available one from the list.
Sometimes, you need to specify the credentials for your private PyPI repository in a more
explicit way, e.g., when the build is running in CI. For such purposes, Paddle also provides a
good old way for authentication by using environment variables. To specify the variable
names containing username and token (e.g., password) for the particular PyPI repo, you can add
the following authEnv
property directly to the repository configuration in the repositories
section of the paddle.yaml
file:
repositories:
- name: private-repo
url: https://private.pypi.repo.org/simple
authEnv:
username: CLIENT_ID
password: CLIENT_SECRET
Note: if there are any available authentication providers specified for this repository
in the paddle.auth.yaml
file as well, the first of them will have precedence over this
authEnv
provider. In other words, Paddle will just add this provider to the end
of the authentication providers list.
requirements
is a list of the Paddle project requirements (e.g., external dependencies). The
list should be split into two sections: main
for the general project requirements to be
included in the requirements list of the Python packages later, and dev
for development
requirements (such as test frameworks, linters, type checkers, etc.)
requirements:
main:
- version: ==4.1.2
name: redis
- name: numpy
version: <=1.22.4
- name: pandas
- name: lxml
noBinary: true
dev:
- name: pytest
- name: twine
version: 4.0.1
Each requirement must have a specified name
to look for in the PyPI repository, as well as an
optional version
and noBinary
property. If the version is not specified, Paddle will try to
resolve it by itself when running the resolveRequirements
task.
The version identifier can be specified as a number with some relation (e.g., by using
prefixes <=
, >=
, <
, >
,
==
, !=
,
~=
, ===
), or just a general version number (the same as with ==
prefix).
noBinary
specifies a strategy to choose a package's distribution methods. If that option is not
set, or set to false, Paddle will prefer binary wheel, otherwise Paddle will use source code
distribution.
Note: for now, only this format of requirement specification is available. Specifying requirements by URL/URI will be added in an upcoming Paddle release, stay tuned!
Tip: if you are using the PyCharm plugin and migrating from the old requirements.txt
file, try
to copy-paste the file's contents into the paddle.yaml
file as is, and Paddle will
convert it to its own format.
findLink
is a list of URLs or paths to the external non-indexed packages (e.g. local-built
package). This is similar to pip's --find-link
option.
For local path or URLs starting from file://
to a directory, then PyPI will look for
archives in the directory.
For paths and URLs to an HTML file, PyPI will look for link to archives as
sdist (.tar.gz
) or wheel (.whl
).
findLinks:
- /home/example/sample-wheel/dist
- https://example.com/python_packages.html
- file:///usr/share/packages/sample-wheel.whl
NB: VCS links (e.g. git://
) are not supported.
The tasks
section consists of several subsections that provide run configurations for
different Python executors.
tasks:
run: ...
test: ...
publish: ...
run
: a section to add entrypoints for running any Python
scripts and (or) modules.
run:
- id: main
entrypoint: main.py
- id: main_as_module
entrypoint: main
args: arg1 arg2
id
: a unique identifier of the task, so that entrypoint can be referenced as
run$<id>
.entrypoint
: a relative path (from the sources
root) to the particular Python script to
be executed. If the .py
extension of the Python script is not specified, the
entrypoint is considered as a module and called in a way like python -m <entrypoint>
when
running the task.args
: extra arguments that will be provided on a startup,
e.g. python <entrypoint> arg1 arg2
. tests
: a section to add configurations for the test frameworks.
For now,
only pytest is
supported.
test:
pytest:
- id: example_tests
targets:
- bar/test_app.py::TestFoo::test_that
- test_example.py
keywords: "not this"
parameters: ""
id
: a unique identifier of the test task, so that entrypoint can be referenced as
pytest$<id>
.targets
: a list of pytest targets
to be executed when running the task (Python module, direcotry, or node id).targets
are not provided, Paddle runs all the tests from the tests
root.keywords
(optional): a string
with
keyword expressions
used by the framework to select tests.parameters
(optional): a string with all the other options/parameters/flags to pass to the
pytest
CLI
command. publish
: a section to add configuration for
the Twine
utility to publish Python packages.
publish:
repo: pypi
twine:
skipExisting: True
verbose: True
repo
: a name of the PyPI repository to be used for publishing packages (Paddle will use
its uploadUrl
endpoint).twine
: a key-value map containing configuration for Twine:skipExisting
, verbose
are boolean flags (
see twine upload
docs for details).targets
: a list of file paths to be published relative to the dist
root. It has dist/*
value by default.There are optional several Paddle-wide options in python
section of $PADDLE_HOME/registry.yaml
:
noCacheDir
(optional): append pip's --no-cache-dir
options, if true. Set to false by
default.autoRemove
(optional): replace local cached wheel with verified wheel of the same version from
PyPI.That options are editable from Paddle's IDEA Settings (Tools -> Paddle
).
To be added soon.
Here is a reference for all the built-in Paddle tasks available at the moment.
clean
: cleans up the ignored directories of the Paddle project. By default, only the local .paddle
project folder (containing incremental caches) is included, but the Python plugin also adds
some other targets if enabled (e.g., .venv
, .pylint_cache
, etc.).cleanAll
: the same task but running it will also call the cleanAll
task for ALL the
subprojects of the given Paddle project.resolveInterpreter
: finds or downloads a suitable Python interpreter.
resolveRepositories
: runs indexing (or retrieves cached indexes) of the specified PyPI
repositories (it is needed for packages' auto-completion in PyCharm).
resolveRequirements
: runs pip
's resolver to resolve a set of the given requirements.
venv
: creates a local virtual environment in the Paddle project.
install
: installs the resolved set of requirements.
lock
: creates a paddle-lock.json
lockfile in the root directory of the Paddle project.
ci
: installs the snapshot versions of the packages specified in the paddle-lock.json
lockfile.
wheel
: builds a Python wheel from the sources
of the Paddle project and saves it in the dist
root.
setup.cfg
and pyproject.toml
files for the Paddle project if they do not exist yet.
You can always tweak them manually and re-run the task if needed.find_packages()
, and then builds a single .whl
-distribution using the name of the project
.
However, to import these packages afterwards in the Python code, the top-level Python package names should be used
(e.g., the names of the corresponding directories under the source root).
See the next section for more details.python -m build
CLI command.twine
: publishes a wheel distribution to the specified PyPI repository.
tasks.publish
subsection.run$<id>
: runs a Python script or module.
tasks.run
subsection.-PextraArgs=<args>
option. For
example paddle run$pep8 -PextraArgs="--first outparse.py"
pytest$<id>
: runs all the test targets by using the Pytest framework.
task.tests
subsection.mypy
: runs Mypy type checker on the sources
of the Paddle project.
pylint
: runs Pylint linter on the sources
of the Paddle
project.
requirements
: generates requirements.txt
in the root directory of every project.
requirements.txt
does not represent actual structure of Paddle source.
It would only generate dependencies for a project. Let's consider the following example of a Paddle multi-project build: the parental project in the monorepo does not contain any source code and just serves as a container for the subprojects (let's say, different ML models). Also, the models share some common code (e.g., utils). The directory structure then could be the following:
main-project/
│
├──ml-model-bert/
│ ├──.paddle/
│ ├──.venv/
│ ├──src/
│ │ └──bert/
│ │ ├──__init__.py
│ │ ├──main.py
│ │ └──...
│ └──paddle.yaml
│
├──ml-model-gpt/
│ ├──.paddle/
│ ├──.venv/
│ ├──src/
│ │ └──gpt/
│ │ ├──__init__.py
│ │ ├──main.py
│ │ └──...
│ └──paddle.yaml
│
├──ml-common/
│ ├──.paddle/
│ ├──.venv/
│ ├──src/
│ │ └──common/
│ │ ├──__init__.py
│ │ ├──main.py
│ │ └──...
│ └──paddle.yaml
│
├──paddle.auth.yaml
└──paddle.yaml
# main-project/paddle.yaml
project: main-project
subprojects:
- ml-model-bert
- ml-model-gpt
- ml-common
# main-project/ml-model-bert/paddle.yaml
project: ml-model-bert
subprojects:
- ml-common
plugins:
enabled:
- python
environment:
path: .venv
python: 3.9
# ...
# main-project/ml-common/paddle.yaml
project: ml-common
plugins:
enabled:
- python
environment:
path: .venv
python: 3.9
# ...
It is generally encouraged to place Python packages (with __init__.py
files) under the source root
of the corresponding Paddle project. Then, if you will have this Paddle project listed as a dependency in
the subprojects
section of some other Paddle project, you will be able to import the Python package by just
specifying its name relatively to source root:
# main-project/ml-model-gpt/src/gpt/main.py
from common.main import .
Paddle YAML
item in the drop-down menu list, or none of the notifications
(such as Load Paddle project
)
appears, please make sure you have installed Paddle plugin in your
PyCharm IDE (which should be 2022.1+, starting from the build number 221.5080
). If everything is correct, try
restarting your IDE..idea
folder from your project and
rebuilding it from scratch.~/.paddle/packages/
folder. The cache might be corrupted when some task execution is cancelled,
so make sure that you have cleaned up the environment and caches before starting a dry Paddle run again..paddle
-folders) by running cleanAll
task from the root
project.If the problem still exists, don't hesitate to open an issue or contact us directly.
If you have found a bug or have a feature suggestion, please don't hesitate to open an issue on GitHub or contact the developers personally: