I use and love nbdev. It makes developing a python package iteratively as easy as it gets. Especially, as I generally do this along side a project which uses said package. Normally when I create a new nbdev project (nbdev_new), I get a settings.ini file and setup.py file. In order to keep working on different packages / projects simple, I will immediately create a conda environment fileenv.yml for the project.
I know that the environment one develops a package in is NOT necessarily the minimum dependencies it requires. Further, a package's dependencies are a subset of those one may need for a working on a project utilizing said package. In MY use case it is clear that I am double-dipping! I am developing the package as I use it on a project.
So for the sake of this feature request let's assume that the package dependencies == project dependencies. In other words, the env.yml file contains all of the requirements for the setting.ini file.
nbdev workflow
make new empty repo "current_project" and clone it
# install for local development
$ pip install -e .
Problem formulation
I am developing a package in python using a setup.py file. My package may have requirements (listed under settings.ini with the key requirements) that get automatically important and used in the setup.py file. While developing my package I have a conda environment which is specified in a yaml file env.yml (see Files > Example Conda File).
I also have some GitHub actions that test my package. I dislike having to update settings.ini manually (especially since it doesn't allow for multiple lines) to get the requirements into setup.py. Especially as I have already listed them out nice and neatly in my env.yml file. So my feature request is as follows:
Feature request
Given a conda environment yaml file (e.g. env.yml) add a nbdev command to iterate through its content and convert the dependencies (and the versions) to the correct pypi version (required by setup.py), storing them in settings.ini under the keyword requirements, e.g. nbdev_env_to_ini
Desired outcome
My desired outcome is that I can store all of my package requirements in the conda environment file env.yml and have them automatically find themselves in the setup.py file under install_requires. Since my workflow is built around reading the requirements in from a settings.ini file (from nbdev), my feature request solves this by taking the values of env.yml and put them in settings.ini.
Current Solution
The current solution is the file env_to_ini.py (see Files > env_to_ini.py).
NOTE this solution uses rich and typer to create an informative command line interface (CLI) which will show you which dependencies were added, changed, removed, or unchanged. This should make working with the script easier (especially should there be bugs in it) as it has not been extensively tested.
How to use env_to_ini.py
Assumptions:
env.yml or env.mac.yml under project root
settings.ini under project root
env_to_ini.py under project root
This script is provided so that if the env.yml (or env.mac.yml) file changes one can automatically update the dependencies of the current_project package (under settings.ini) to match.
# default usage
$ python env_to_ini.py
# show packages that didnt change
$ python env_to_ini.py --unchanged
# specify a different environment file
$ python env_to_ini.py --unchanged --file=env.mac.yml
Caveats
This is a bit hacky. You can modify it per project as needed. The so called "hackiness" is primarily located under the two TODOs, which I will now explain. Note that TODO 2 is more important than TODO 1.
TODO 1: exclusion
Search for the following in the provided script:
# TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
# by default this would include python and pip.
Some packages are to be excluded from the conda environment file env.yml. Namely, python. Under the first TODO, which achieves this currently via an if / elif / else statement, one could change the script to accept an additional argument containing package to exclude, or read in an additional file containing these.
NOTE:
It is unclear to me that if one were to modify the env.yml file to have a section afterdependencies called ignore if it would mess up one's usage with conda. e.g.
# TODO: IMPROVEMENT 2: utilize a map of packages to rename.
# by default this would include pytorch --> torch.
# Ideally, this would figure it out automatically.
Some packages need to be renamed because their package name on conda is different than it is on pypi. The example give is pytorch which is listed as torch for pypi and pytorch for conda.
Under the function requirements_to_ini this is currently achieved by using an if / elif / else statement. One could change the script to accept an additional argument containing package to rename, or read in an additional file containing these.
NOTE:
It is unclear to me that if one were to modify the env.yml file to have a section afterdependencies called rename if it would mess up one's usage with conda. e.g.
NOTE:
It is unclear to me how one could determine this automatically per the feature requests desired outcome.
Files
Example conda file
# EXAMPLE YAML FILE
name: current_project
channels:
- pytorch
- conda-forge
- fastai
dependencies:
- python>=3.10
# Utilities
# -------------------------------------------------------------------------
- tqdm
- rich
- typer
# Jupyter Notebook
# -------------------------------------------------------------------------
- conda-forge::notebook
- conda-forge::ipykernel
- conda-forge::ipywidgets
- conda-forge::jupyter_contrib_nbextensions
# nbdev
# -------------------------------------------------------------------------
- fastai::nbdev>=2.3.12
# PyTorch & Deep Learning
# -------------------------------------------------------------------------
- pytorch>=2
# NOTE: add pytorch-cuda if using a CUDA enabled GPU. You will need to
# remove this if you are on Apple Silicon
# - pytorch::pytorch-cuda
- conda-forge::pytorch-lightning
# Plotting
# -------------------------------------------------------------------------
- conda-forge::matplotlib
- conda-forge::seaborn
# Data Wrangling
# -------------------------------------------------------------------------
- conda-forge::scikit-learn
- pandas>=2
- numpy
- scipy
# Pip / non-conda packages
# -------------------------------------------------------------------------
- pip
- pip:
# PyTorch & Deep Learning
# -----------------------------------------------------------------------
- dgl
env_to_ini.py
# env_to_ini.py
import yaml
import configparser
from rich.console import Console
from rich.table import Table
import typer
from typing import Optional, Tuple
app = typer.Typer()
console = Console()
# NOTE: utility function to print colored text
def cprint(style:str, text:str) -> None:
console.print(f"[{style}]{text}[/{style}]")
def has_channel(requirements_str:str) -> bool:
return '::' in requirements_str
def extract_channel(requirements_str:str) -> Tuple[Optional[str], str]:
channel = None
if has_channel(requirements_str):
channel, requirements_str = requirements_str.split('::', 1)
return channel, requirements_str
def is_not_valid_package_char(s:str) -> bool:
return not (s.isalnum() or s in ['-', '_', '.'])
def split_str_at_first_non_alpha(s:str) -> Tuple[str, str]:
idx = next((
i for i, char in enumerate(s)
if is_not_valid_package_char(char)
), len(s))
return s[:idx], s[idx:]
def split_package_version(s:str) -> Tuple[str, str]:
# NOTE: alias for split_str_at_first_non_alpha
return split_str_at_first_non_alpha(s)
# NOTE: this parses requirements from the settings.ini file. Thus there is one line and each package is separated by a space.
def parse_requirements(requirements_str):
requirements = {}
for req in requirements_str.split():
package, version = split_package_version(req)
requirements[package] = version
return requirements
# NOTE: this parse depdencies form the env.yml file.
def extract_packages(dependencies):
packages = {}
for dep in dependencies:
if isinstance(dep, str):
channel, package_version = extract_channel(dep)
package, version = split_package_version(package_version)
# TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
# by default this would include python and pip.
# NOTE: we do not need to add python to the requirements
if package == 'python':
continue
# NOTE: likewise we do not need pip
elif package == 'pip':
continue
packages[package] = version
elif isinstance(dep, dict):
for key, values in dep.items():
if key == 'pip':
for pip_dep in values:
package, version = split_package_version(pip_dep)
packages[package] = version
return packages
# NOTE: check if the depdencies in the env.yml file vary from the ones in the settings.ini file.
def compare_requirements(old, new):
added = {k: v for k, v in new.items() if k not in old}
removed = {k: v for k, v in old.items() if k not in new}
changed = {k: (old[k], new[k]) for k in old if k in new and old[k] != new[k]}
remained = {k: (old[k], new[k]) for k in old if k in new and old[k] == new[k]}
return added, removed, changed, remained
# NOTE: I like pretty terminals
def print_changes(added, removed, changed, remained):
table = Table(title="Changes")
table.add_column("Package", style="cyan")
table.add_column("Old Version", style="magenta")
table.add_column("New Version", style="green")
table.add_column("Status", style="yellow")
for package, version in added.items():
table.add_row(f':package: {package}', "", version, "Added")
for package, version in removed.items():
table.add_row(f':package: {package}', version, "", "Removed")
for package, versions in changed.items():
table.add_row(f':package: {package}', versions[0], versions[1], "Changed")
for package, versions in remained.items():
table.add_row(f':package: {package}', versions[0], versions[1], "Unchanged")
console.print(table)
def requirements_to_ini(requirments:dict) -> str:
ini = ''
for package, version in requirments.items():
# TODO: IMPROVEMENT 2: utilize a map of packages to rename.
# by default this would include pytorch --> torch.
# Ideally, this would figure it out automatically.
# NOTE: this is a hack to make the env.yml file compatible with the settings.ini file
# since the env.yml file uses pytorch and the settings.ini file uses torch.
# Add more elif statements if you need to change other package names.
if package == 'pytorch':
package = 'torch'
if version:
ini += f"{package}{version} "
else:
ini += f"{package} "
return ini
@app.command()
def update_requirements(
file: Optional[str] = typer.Option(
'env.mac.yml',
help="YAML file to extract the new requirements from.",
),
unchanged: Optional[bool] = typer.Option(
False,
help="Whether to print all packages, including the ones whose versions haven't changed.",
),
):
# NOTE: notice that file is `env.mac.yml` and not `env.yml`. Now with Apple Silicon I have
# one env file for more common CUDA versions and one for Apple Silicon.
cprint("bold cyan", f"Loading environment yaml file {file}...")
with open(file, 'r') as f:
env = yaml.safe_load(f)
# NOTE: read in the current dependencies from the conda env.yml file
cprint("bold cyan", "Extracting packages and their versions...")
new_requirements = extract_packages(env['dependencies'])
# NOTE: read in the previous requirements from the settings.ini file
cprint("bold cyan", "Loading settings.ini file...")
config = configparser.ConfigParser()
config.read('settings.ini')
cprint("bold cyan", "Comparing the old and new requirements...")
old_requirements = parse_requirements(config['DEFAULT']['requirements'])
# NOTE: check for changes
added, removed, changed, remained = compare_requirements(old_requirements, new_requirements)
# If --unchanged option is given, print unchanged packages as well
if unchanged:
print_changes(added, removed, changed, remained)
else:
print_changes(added, removed, changed, {})
# NOTE: update the requirements in the settings.ini file
cprint("bold cyan", "Updating the requirements...")
config['DEFAULT']['requirements'] = requirements_to_ini(new_requirements)
cprint("bold cyan", "Saving the updated settings.ini file...")
with open('settings.ini', 'w') as f:
config.write(f)
cprint("bold green", "Successfully updated the requirements in settings.ini!")
if __name__ == "__main__":
app()
Motivation
I use and love nbdev. It makes developing a python package iteratively as easy as it gets. Especially, as I generally do this along side a project which uses said package. Normally when I create a new nbdev project (
nbdev_new
), I get asettings.ini
file andsetup.py
file. In order to keep working on different packages / projects simple, I will immediately create a conda environment fileenv.yml
for the project.I know that the environment one develops a package in is NOT necessarily the minimum dependencies it requires. Further, a package's dependencies are a subset of those one may need for a working on a project utilizing said package. In MY use case it is clear that I am double-dipping! I am developing the package as I use it on a project.
So for the sake of this feature request let's assume that the
package dependencies == project dependencies
. In other words, theenv.yml
file contains all of therequirements
for thesetting.ini
file.nbdev workflow
make new empty repo "current_project" and clone it
cd path/to/current_project
nbdev_new
make
env.yml
filecreate / update environment:
update conda environment as needed
$ mamba env update -n current_project --file env.yml
$ mamba env update -n current_project --file env.mac.yml
current_project
:Problem formulation
I am developing a package in python using a
setup.py
file. My package may have requirements (listed undersettings.ini
with the keyrequirements
) that get automatically important and used in thesetup.py
file. While developing my package I have a conda environment which is specified in a yaml fileenv.yml
(seeFiles > Example Conda File
).I also have some GitHub actions that test my package. I dislike having to update
settings.ini
manually (especially since it doesn't allow for multiple lines) to get the requirements intosetup.py
. Especially as I have already listed them out nice and neatly in myenv.yml
file. So my feature request is as follows:Feature request
Given a conda environment yaml file (e.g.
env.yml
) add a nbdev command to iterate through its content and convert the dependencies (and the versions) to the correctpypi
version (required bysetup.py
), storing them insettings.ini
under the keywordrequirements
, e.g.nbdev_env_to_ini
Desired outcome
My desired outcome is that I can store all of my package requirements in the conda environment file
env.yml
and have them automatically find themselves in thesetup.py
file underinstall_requires
. Since my workflow is built around reading the requirements in from asettings.ini
file (from nbdev), my feature request solves this by taking the values ofenv.yml
and put them insettings.ini
.Current Solution
The current solution is the file
env_to_ini.py
(seeFiles > env_to_ini.py
).NOTE this solution uses
rich
andtyper
to create an informative command line interface (CLI) which will show you which dependencies were added, changed, removed, or unchanged. This should make working with the script easier (especially should there be bugs in it) as it has not been extensively tested.How to use
env_to_ini.py
Assumptions:
env.yml
orenv.mac.yml
under project rootsettings.ini
under project rootenv_to_ini.py
under project rootThis script is provided so that if the
env.yml
(orenv.mac.yml
) file changes one can automatically update the dependencies of thecurrent_project
package (undersettings.ini
) to match.Caveats
This is a bit hacky. You can modify it per project as needed. The so called "hackiness" is primarily located under the two
TODO
s, which I will now explain. Note thatTODO
2 is more important thanTODO
1.TODO 1: exclusion
Search for the following in the provided script:
Some packages are to be excluded from the conda environment file
env.yml
. Namely,python
. Under the firstTODO
, which achieves this currently via anif / elif / else
statement, one could change the script to accept an additional argument containing package to exclude, or read in an additional file containing these.NOTE: It is unclear to me that if one were to modify the
env.yml
file to have a section afterdependencies
calledignore
if it would mess up one's usage with conda. e.g.TODO 2: mapping
Search for the following in the provided script:
Some packages need to be renamed because their package name on conda is different than it is on
pypi
. The example give ispytorch
which is listed astorch
forpypi
andpytorch
for conda.Under the function
requirements_to_ini
this is currently achieved by using anif / elif / else
statement. One could change the script to accept an additional argument containing package to rename, or read in an additional file containing these.NOTE: It is unclear to me that if one were to modify the
env.yml
file to have a section afterdependencies
calledrename
if it would mess up one's usage with conda. e.g.NOTE: It is unclear to me how one could determine this automatically per the feature requests desired outcome.
Files
Example conda file
env_to_ini.py