JetBrains-Research / paddle

Young and dynamic build system for Python
MIT License
20 stars 6 forks source link

Paddle
Paddle

JB Research Latest release build tests Downloads

Paddle is a fresh, extensible, and IDE-friendly build system for Python. It provides a declarative way for managing project dependencies, configuring execution environment, running tasks, and much more.

Guide outline

Why should I use Paddle?

Getting started

Prerequisites

To run Paddle, you need:

To be able to load and install various versions of Python interpreters, please, follow the instructions given here for your platform.

Experimental: Paddle CLI is compiled as a native image using GraalVM and available for Linux and macOS. You can still use plain paddle-$version-all.jar build with Java 8 (or higher).

Plugin Installation

The preferable way to install Paddle is to download a PyCharm plugin from the JetBrains Marketplace.

Paddle IDE Plugin

The plugin already contains a bootstrapped Paddle build system inside (so you don't even have to install anything else manually) and supports a bunch of features:

CLI

If you want to use the native binary image of the CLI tool, you can download it with the following simple commands:

curl -s 'https://raw.githubusercontent.com/JetBrains-Research/paddle/master/scripts/install.sh' -o ./install.sh && chmod +x install.sh && ./install.sh && rm ./install.sh

Paddle CLI wrapper will automatically detect your system and download necessary binary.

Since right now native binaries are not supported for all OS types and platforms, you can directly download JVM version of the tool.

curl -s 'https://raw.githubusercontent.com/JetBrains-Research/paddle/master/scripts/install.sh' -o ./install.sh && chmod +x install.sh && ./install.sh jar && rm ./install.sh

Note: it requires JRE to run.

You can verify your installation by running:

./paddle --help

Note: Paddle CLI generally assumes that it is called from the root directory of the current Paddle project.

Quick start

For a quick start, you can simply create a new project in the PyCharm IDE and choose File - New - Paddle YAML from the top menu. This will generate a template paddle.yaml build configuration file in the root directory of your project. Then, press the Load Paddle project button on the pop-up in the bottom-right corner of your screen and wait until Paddle finishes building the project's model and configuring the execution environment. You can check the build status on the Build tool window tab. That's it, you are now ready to go!

In case of a using the CLI, create a new paddle.yaml file in the root directory of your project and paste the following script:

project: example

metadata:
  version: 0.1.0

plugins:
  enabled:
    - python

# Prerequisites: https://github.com/pyenv/pyenv/wiki#suggested-build-environment
environment:
  path: .venv
  python: 3.9

requirements:
  dev:
    - name: pytest
      version: ==7.1.2
    - name: pylint
      version: ==2.14.4
    - name: mypy
      version: ==0.961
    - name: twine
      version: ==4.0.1
    - name: wheel
      version: ==0.37.1

Then, you can run the following command:

paddle install

It will prepare your environment, find or download the Python interpreter, and install the specified dev requirements.

Key concepts

YAML Configuration

Build configuration of the Paddle project is specified in the paddle.yaml file. This file is semantically split into sections, where some of them are built-in, and some of them are added by the external or bundled plugins.

If you are using the PyCharm plugin, it will help you with the schema of the paddle.yaml automatically. Use the Ctrl + Shift + Space shortcut (by default) to look through the completion variants when writing the YAML configuration.

Core sections

All these sections are available in every Paddle project.

Project

project is a unique name of the given Paddle project. If you are also using a Python plugin to build Python wheels, this name will be used as a package name.

Note: in Python, packages should be named using underscore_case, while names of the Paddle projects could use any case in general. However, if you are planning to build your own Python packages (.whl-distributions), make sure you are using underscores for naming packages under the source root of the Paddle project.

project: example

Subprojects

subprojects is a list of names of the subprojects for the current project. There are no restrictions where these subprojects should be placed in relation to each other, but they all have to be stored somewhere under the root directory of the root Paddle project.

subprojects:
  - subproject-one
  - subproject-two
  - some-other-subproject

Roots

roots is a key-value map of the "root"-folders of the project.

roots:
  sources: src/main
  tests: src/test
  resources: src/resources
  testsResources: test/resources
  dist: build

Plugins

plugins is a list of plugins to be available in the current Paddle project. Use the enabled subsection to specify bundled/built-in plugins, or jars to include paths to your own custom plugins.

plugins:
  enabled:
    - python
  jars:
    - plugins/test-plugin-0.1.0.jar

Python sections

The following sections are added by the python plugin, so make sure you have enabled it in your project.

Metadata

metadata is a key-value map containing the Python Package metadata. Paddle will use it when building a wheel distribution.

metadata:
  version: 0.1.0
  description: Short description of the project.
  author: Your Name
  authorEmail: your.email@example.com
  url: your.homepage.com
  keywords: "key word example"
  classifiers:
    - "Programming Language :: Python :: 3"
    - "Topic :: Scientific/Engineering :: Artificial Intelligence"
    - "Intended Audience :: Developers"

Environment

environment is a key-value specification of the Python virtual environment to be used in the Paddle project.

environment:
  path: .venv # the value is the same by default
  python: 3.9

Repositories

repositories is a list of the available PyPI repositories.

repositories:
  - name: pypi
    url: https://pypi.org
    uploadUrl: https://upload.pypi.org/legacy/
    default: True
    secondary: False

Note: a standard PyPI repository (shown in the example above) is included in the list of repositories for every Paddle project by default, so you don't need to add it manually every time.

Note: the repository list is configured for the current Paddle project only. If you have a multi-project Paddle build with nested projects, you should either specify the repositories in each paddle.yaml file, or use a topmost all section to wrap the section with repositories:

all:
  repositories:
    ...

This way, the list of repositories will be available in every subproject of the current Paddle project.

Authentication

Paddle provides several ways to specify the authentication way for your PyPI repository:

The preferable way is to create a paddle.auth.yaml file and place it in the root directory of your Paddle project. Please note that if you have a multi-project build, you need to create only a single instance of this file and place it in the topmost root project directory!

If you are using a PyCharm plugin, you can create such file by choosing File - New - Paddle Auth YAML.

The schema of the paddle.auth.yaml is the following:

  repositories:
    - name: private-repo-name
      type: netrc | keyring | profile | none
      username: your-username

repositories: a list of PyPI repository references with supplemented authentication ways.

Note: If there are several authentication providers specified for a single repository, Paddle will use the first available one from the list.

Sometimes, you need to specify the credentials for your private PyPI repository in a more explicit way, e.g., when the build is running in CI. For such purposes, Paddle also provides a good old way for authentication by using environment variables. To specify the variable names containing username and token (e.g., password) for the particular PyPI repo, you can add the following authEnv property directly to the repository configuration in the repositories section of the paddle.yaml file:

repositories:
  - name: private-repo
    url: https://private.pypi.repo.org/simple
    authEnv:
      username: CLIENT_ID
      password: CLIENT_SECRET

Note: if there are any available authentication providers specified for this repository in the paddle.auth.yaml file as well, the first of them will have precedence over this authEnv provider. In other words, Paddle will just add this provider to the end of the authentication providers list.

Requirements

requirements is a list of the Paddle project requirements (e.g., external dependencies). The list should be split into two sections: main for the general project requirements to be included in the requirements list of the Python packages later, and dev for development requirements (such as test frameworks, linters, type checkers, etc.)

requirements:
  main:
    - version: ==4.1.2
      name: redis
    - name: numpy
      version: <=1.22.4
    - name: pandas
    - name: lxml
      noBinary: true
  dev:
    - name: pytest
    - name: twine
      version: 4.0.1

Each requirement must have a specified name to look for in the PyPI repository, as well as an optional version and noBinary property. If the version is not specified, Paddle will try to resolve it by itself when running the resolveRequirements task.

The version identifier can be specified as a number with some relation (e.g., by using prefixes <=, >=, <, >, ==, !=, ~=, ===), or just a general version number (the same as with == prefix).

noBinary specifies a strategy to choose a package's distribution methods. If that option is not set, or set to false, Paddle will prefer binary wheel, otherwise Paddle will use source code distribution.

Note: for now, only this format of requirement specification is available. Specifying requirements by URL/URI will be added in an upcoming Paddle release, stay tuned!

Tip: if you are using the PyCharm plugin and migrating from the old requirements.txt file, try to copy-paste the file's contents into the paddle.yaml file as is, and Paddle will convert it to its own format.

Copy-paste example

Find links

findLink is a list of URLs or paths to the external non-indexed packages (e.g. local-built package). This is similar to pip's --find-link option.

For local path or URLs starting from file:// to a directory, then PyPI will look for archives in the directory.

For paths and URLs to an HTML file, PyPI will look for link to archives as sdist (.tar.gz) or wheel (.whl).

findLinks:
    - /home/example/sample-wheel/dist
    - https://example.com/python_packages.html
    - file:///usr/share/packages/sample-wheel.whl 

NB: VCS links (e.g. git://) are not supported.

Tasks section

The tasks section consists of several subsections that provide run configurations for different Python executors.

tasks:
  run: ...
  test: ...
  publish: ...

Registry

There are optional several Paddle-wide options in python section of $PADDLE_HOME/registry.yaml:

That options are editable from Paddle's IDEA Settings (Tools -> Paddle).

Docker & SSH sections

To be added soon.

Tasks

Here is a reference for all the built-in Paddle tasks available at the moment.

Core tasks

Python tasks

Example: multi-project build

Let's consider the following example of a Paddle multi-project build: the parental project in the monorepo does not contain any source code and just serves as a container for the subprojects (let's say, different ML models). Also, the models share some common code (e.g., utils). The directory structure then could be the following:

  main-project/
  │
  ├──ml-model-bert/
  │  ├──.paddle/
  │  ├──.venv/
  │  ├──src/
  │  │   └──bert/
  │  │      ├──__init__.py
  │  │      ├──main.py
  │  │      └──...
  │  └──paddle.yaml
  │  
  ├──ml-model-gpt/
  │  ├──.paddle/
  │  ├──.venv/
  │  ├──src/
  │  │   └──gpt/
  │  │      ├──__init__.py
  │  │      ├──main.py
  │  │      └──...
  │  └──paddle.yaml
  │  
  ├──ml-common/
  │  ├──.paddle/
  │  ├──.venv/
  │  ├──src/
  │  │   └──common/
  │  │      ├──__init__.py
  │  │      ├──main.py
  │  │      └──...
  │  └──paddle.yaml
  │
  ├──paddle.auth.yaml
  └──paddle.yaml
# main-project/paddle.yaml

project: main-project

subprojects:
  - ml-model-bert
  - ml-model-gpt
  - ml-common
# main-project/ml-model-bert/paddle.yaml

project: ml-model-bert

subprojects:
  - ml-common

plugins:
  enabled:
    - python

environment:
  path: .venv
  python: 3.9

# ...
# main-project/ml-common/paddle.yaml

project: ml-common

plugins:
  enabled:
    - python

environment:
  path: .venv
  python: 3.9

# ...

It is generally encouraged to place Python packages (with __init__.py files) under the source root of the corresponding Paddle project. Then, if you will have this Paddle project listed as a dependency in the subprojects section of some other Paddle project, you will be able to import the Python package by just specifying its name relatively to source root:

# main-project/ml-model-gpt/src/gpt/main.py

from common.main import .

Troubleshooting

Using PyCharm plugin

Running Paddle tasks

If the problem still exists, don't hesitate to open an issue or contact us directly.

Contact us

If you have found a bug or have a feature suggestion, please don't hesitate to open an issue on GitHub or contact the developers personally: