fastai / nbdev

Create delightful software with Jupyter Notebooks
https://nbdev.fast.ai/
Apache License 2.0
4.8k stars 484 forks source link

Feature Request: add support to convert conda environments to setting.ini requirements? #1344

Open dsm-72 opened 1 year ago

dsm-72 commented 1 year ago

Motivation

I use and love nbdev. It makes developing a python package iteratively as easy as it gets. Especially, as I generally do this along side a project which uses said package. Normally when I create a new nbdev project (nbdev_new), I get a settings.ini file and setup.py file. In order to keep working on different packages / projects simple, I will immediately create a conda environment file env.yml for the project.

I know that the environment one develops a package in is NOT necessarily the minimum dependencies it requires. Further, a package's dependencies are a subset of those one may need for a working on a project utilizing said package. In MY use case it is clear that I am double-dipping! I am developing the package as I use it on a project.

So for the sake of this feature request let's assume that the package dependencies == project dependencies. In other words, the env.yml file contains all of the requirements for the setting.ini file.

nbdev workflow

  1. make new empty repo "current_project" and clone it

  2. cd path/to/current_project

  3. nbdev_new

  4. make env.yml file

  5. create / update environment:

    
    # create conda environment
    $ mamba env create -f env.yml

update conda environment as needed

$ mamba env update -n current_project --file env.yml

$ mamba env update -n current_project --file env.mac.yml

6. active environment:
```shell
# activate conda environment
$ conda activate current_project
  1. install current_project:
# install for local development
$ pip install -e .

Problem formulation

I am developing a package in python using a setup.py file. My package may have requirements (listed under settings.ini with the key requirements) that get automatically important and used in the setup.py file. While developing my package I have a conda environment which is specified in a yaml file env.yml (see Files > Example Conda File).

I also have some GitHub actions that test my package. I dislike having to update settings.ini manually (especially since it doesn't allow for multiple lines) to get the requirements into setup.py. Especially as I have already listed them out nice and neatly in my env.yml file. So my feature request is as follows:

Feature request

Given a conda environment yaml file (e.g. env.yml) add a nbdev command to iterate through its content and convert the dependencies (and the versions) to the correct pypi version (required by setup.py), storing them in settings.ini under the keyword requirements, e.g. nbdev_env_to_ini

Desired outcome

My desired outcome is that I can store all of my package requirements in the conda environment file env.yml and have them automatically find themselves in the setup.py file under install_requires. Since my workflow is built around reading the requirements in from a settings.ini file (from nbdev), my feature request solves this by taking the values of env.yml and put them in settings.ini.

Current Solution

The current solution is the file env_to_ini.py (see Files > env_to_ini.py).

NOTE this solution uses rich and typer to create an informative command line interface (CLI) which will show you which dependencies were added, changed, removed, or unchanged. This should make working with the script easier (especially should there be bugs in it) as it has not been extensively tested.

How to use env_to_ini.py

Assumptions:

This script is provided so that if the env.yml (or env.mac.yml) file changes one can automatically update the dependencies of the current_project package (under settings.ini) to match.

# default usage
$ python env_to_ini.py

# show packages that didnt change
$ python env_to_ini.py  --unchanged  

# specify a different environment file
$ python env_to_ini.py  --unchanged  --file=env.mac.yml

Caveats

This is a bit hacky. You can modify it per project as needed. The so called "hackiness" is primarily located under the two TODOs, which I will now explain. Note that TODO 2 is more important than TODO 1.

TODO 1: exclusion

Search for the following in the provided script:

# TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
#       by default this would include python and pip.        

Some packages are to be excluded from the conda environment file env.yml. Namely, python. Under the first TODO, which achieves this currently via an if / elif / else statement, one could change the script to accept an additional argument containing package to exclude, or read in an additional file containing these.

NOTE: It is unclear to me that if one were to modify the env.yml file to have a section after dependencies called ignore if it would mess up one's usage with conda. e.g.

dependencies:
  - python>=3.10
  # - ...
ignore:
  - python
  - pip
  # ...

TODO 2: mapping

Search for the following in the provided script:

# TODO: IMPROVEMENT 2: utilize a map of packages to rename.
#       by default this would include pytorch --> torch.
#       Ideally, this would figure it out automatically.

Some packages need to be renamed because their package name on conda is different than it is on pypi. The example give is pytorch which is listed as torch for pypi and pytorch for conda.

Under the function requirements_to_ini this is currently achieved by using an if / elif / else statement. One could change the script to accept an additional argument containing package to rename, or read in an additional file containing these.

NOTE: It is unclear to me that if one were to modify the env.yml file to have a section after dependencies called rename if it would mess up one's usage with conda. e.g.

dependencies:
  - python>=3.10
  # - ...
rename:
  - pytorch,torch
  # ...

NOTE: It is unclear to me how one could determine this automatically per the feature requests desired outcome.

Files

Example conda file

# EXAMPLE YAML FILE

name: current_project
channels:
  - pytorch
  - conda-forge  
  - fastai

dependencies:  
  - python>=3.10

 # Utilities
 # -------------------------------------------------------------------------
  - tqdm
  - rich
  - typer

  # Jupyter Notebook
  # -------------------------------------------------------------------------
  - conda-forge::notebook
  - conda-forge::ipykernel
  - conda-forge::ipywidgets
  - conda-forge::jupyter_contrib_nbextensions

  # nbdev
  # -------------------------------------------------------------------------
  - fastai::nbdev>=2.3.12

  # PyTorch & Deep Learning
  # -------------------------------------------------------------------------
  - pytorch>=2
  # NOTE: add pytorch-cuda if using a CUDA enabled GPU. You will need to 
  #       remove this if you are on Apple Silicon
  # - pytorch::pytorch-cuda
  - conda-forge::pytorch-lightning

  # Plotting
  # -------------------------------------------------------------------------
  - conda-forge::matplotlib
  - conda-forge::seaborn

  # Data Wrangling
  # -------------------------------------------------------------------------
  - conda-forge::scikit-learn
  - pandas>=2
  - numpy
  - scipy    

  # Pip / non-conda packages
  # -------------------------------------------------------------------------
  - pip
  - pip: 
    # PyTorch & Deep Learning
    # -----------------------------------------------------------------------
    - dgl

env_to_ini.py

# env_to_ini.py

import yaml
import configparser
from rich.console import Console
from rich.table import Table

import typer
from typing import Optional, Tuple

app = typer.Typer()
console = Console()

# NOTE: utility function to print colored text
def cprint(style:str, text:str) -> None:
    console.print(f"[{style}]{text}[/{style}]")

def has_channel(requirements_str:str) -> bool:
    return '::' in requirements_str

def extract_channel(requirements_str:str) -> Tuple[Optional[str], str]:
    channel = None    
    if has_channel(requirements_str):
        channel, requirements_str = requirements_str.split('::', 1)        
    return channel, requirements_str

def is_not_valid_package_char(s:str) -> bool:
    return not (s.isalnum() or s in ['-', '_', '.'])

def split_str_at_first_non_alpha(s:str) -> Tuple[str, str]:
    idx = next((
            i for i, char in enumerate(s) 
            if is_not_valid_package_char(char)
        ), len(s))
    return s[:idx], s[idx:]

def split_package_version(s:str) -> Tuple[str, str]:
    # NOTE: alias for split_str_at_first_non_alpha
    return split_str_at_first_non_alpha(s)

# NOTE: this parses requirements from the settings.ini file. Thus there is one line and each package is separated by a space.
def parse_requirements(requirements_str):
    requirements = {}
    for req in requirements_str.split():
        package, version = split_package_version(req)
        requirements[package] = version
    return requirements

# NOTE: this parse depdencies form the env.yml file.
def extract_packages(dependencies):
    packages = {}
    for dep in dependencies:

        if isinstance(dep, str):
            channel, package_version = extract_channel(dep)
            package, version = split_package_version(package_version)

            # TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
            #       by default this would include python and pip.

            # NOTE: we do not need to add python to the requirements
            if package == 'python':
                continue

            # NOTE: likewise we do not need pip
            elif package == 'pip':
                continue

            packages[package] = version

        elif isinstance(dep, dict):
            for key, values in dep.items():                
                if key == 'pip':
                    for pip_dep in values:                        
                        package, version = split_package_version(pip_dep)                        
                        packages[package] = version                        
    return packages

# NOTE: check if the depdencies in the env.yml file vary from the ones in the settings.ini file.
def compare_requirements(old, new):
    added = {k: v for k, v in new.items() if k not in old}
    removed = {k: v for k, v in old.items() if k not in new}
    changed = {k: (old[k], new[k]) for k in old if k in new and old[k] != new[k]}
    remained = {k: (old[k], new[k]) for k in old if k in new and old[k] == new[k]}
    return added, removed, changed, remained

# NOTE: I like pretty terminals
def print_changes(added, removed, changed, remained):
    table = Table(title="Changes")
    table.add_column("Package", style="cyan")
    table.add_column("Old Version", style="magenta")
    table.add_column("New Version", style="green")
    table.add_column("Status", style="yellow")

    for package, version in added.items():
        table.add_row(f':package: {package}', "", version, "Added")
    for package, version in removed.items():
        table.add_row(f':package: {package}', version, "", "Removed")        
    for package, versions in changed.items():
        table.add_row(f':package: {package}', versions[0], versions[1], "Changed")
    for package, versions in remained.items():
        table.add_row(f':package: {package}', versions[0], versions[1], "Unchanged")

    console.print(table)

def requirements_to_ini(requirments:dict) -> str:
    ini = ''
    for package, version in requirments.items():
        # TODO: IMPROVEMENT 2: utilize a map of packages to rename.
        #       by default this would include pytorch --> torch.
        #       Ideally, this would figure it out automatically.

        # NOTE: this is a hack to make the env.yml file compatible with the settings.ini file
        #       since the env.yml file uses pytorch and the settings.ini file uses torch.
        #       Add more elif statements if you need to change other package names.
        if package == 'pytorch':
            package = 'torch'

        if version:
            ini += f"{package}{version} "
        else:
            ini += f"{package} "
    return ini

@app.command()
def update_requirements(
    file: Optional[str] = typer.Option(
        'env.mac.yml', 
        help="YAML file to extract the new requirements from.",
    ),
    unchanged: Optional[bool] = typer.Option(
        False,
        help="Whether to print all packages, including the ones whose versions haven't changed.",
    ),
):
    # NOTE: notice that file is `env.mac.yml` and not `env.yml`. Now with Apple Silicon I have 
    #       one env file for more common CUDA versions and one for Apple Silicon.

    cprint("bold cyan", f"Loading environment yaml file {file}...")
    with open(file, 'r') as f:
        env = yaml.safe_load(f)

    # NOTE: read in the current dependencies from the conda env.yml file
    cprint("bold cyan", "Extracting packages and their versions...")
    new_requirements = extract_packages(env['dependencies'])

    # NOTE: read in the previous requirements from the settings.ini file
    cprint("bold cyan", "Loading settings.ini file...")
    config = configparser.ConfigParser()
    config.read('settings.ini')

    cprint("bold cyan", "Comparing the old and new requirements...")
    old_requirements = parse_requirements(config['DEFAULT']['requirements'])

    # NOTE: check for changes
    added, removed, changed, remained = compare_requirements(old_requirements, new_requirements)

    # If --unchanged option is given, print unchanged packages as well
    if unchanged:
        print_changes(added, removed, changed, remained)
    else:
        print_changes(added, removed, changed, {})

    # NOTE: update the requirements in the settings.ini file
    cprint("bold cyan", "Updating the requirements...")
    config['DEFAULT']['requirements'] = requirements_to_ini(new_requirements)

    cprint("bold cyan", "Saving the updated settings.ini file...")
    with open('settings.ini', 'w') as f:
        config.write(f)

    cprint("bold green", "Successfully updated the requirements in settings.ini!")

if __name__ == "__main__":
    app()