Framework for training that implements generative diffusion models for discrete data.
For unified training and evaluation and easy implementation of new diffusion models we define in our framework two basic concepts, Model and Model Trainer.
Model. This concept is implemented as abstract class AModel
in [discrete-diffusion.models]. The UML diagram of the class can be find below.
In order to set up the necessary environment:
Install [virtualenv] and [virtualenvwrapper].
Create conda env for the project:
conda create -n graph_bridges python=3.10.9
Activate the enviroment
activate graph_bridges
Install torch enable cuda
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Install the project in edit mode:
python setup.py develop
pip install -e .
create data folders in main directory:
data
data/raw
data/preprocessed
look at project organization below
Optional and needed only once after git clone
:
install several [pre-commit] git hooks with:
pre-commit install
# You might also want to run `pre-commit autoupdate`
and checkout the configuration under .pre-commit-config.yaml
.
The -n, --no-verify
flag of git commit
can be used to deactivate pre-commit hooks temporarily.
install [nbstripout] git hooks to remove the output cells of committed notebooks with:
nbstripout --install --attributes notebooks/.gitattributes
This is useful to avoid large diffs due to plots in your notebooks.
A simple nbstripout --uninstall
will revert these changes.
Then take a look into the scripts
and notebooks
folders.
├── AUTHORS.md <- List of developers and maintainers.
├── CHANGELOG.md <- Changelog to keep track of new features and fixes.
├── CONTRIBUTING.md <- Guidelines for contributing to this project.
├── Dockerfile <- Build a docker container with `docker build .`.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── configs <- Directory for configurations of model & application.
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── requrements.txt <- The python environment file for reproducibility.
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .`
│ to install for development or to build `tox -e build`.
├── references <- Data dictionaries, manuals, and all other materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated plots and figures for reports.
├── scripts <- Analysis and production scripts which import the
│ actual PYTHON_PKG, e.g. train_model.
├── setup.py <- Use `python setup.py develop` to install for
│ development or `python setup.py bdist_wheel` to build.
├── src
│ └── kiwissenbase <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `pytest`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.