Open stepan-anokhin opened 3 years ago
poetry support for monorepositories is not complete yet but it is being actively discussed at the moment (see the corresponding feature request https://github.com/python-poetry/poetry/issues/936). It seems like poetry supports some of the monorepo features though (namely it allows to mix versioned and editable local path dependencies; see the corresponding https://github.com/pypa/packaging.python.org/issues/506#issuecomment-391140122). I've tested this approach and it seems to work well: all projects/libs use editable installs from the current codebase while build artifacts have versioned dependencies. So this is a good news.
Investigate how to manage conda dependencies in ML-related packages. Some of the projects (e.g. server
) share some logic with the dedup-app
while at the same time don't need ML dependencies and conda all together, so they could rely only on poetry and python's standards. At the same time for ML-related projects (e.g. dedup-app
) it is nice to have conda packages as they come pre-compiled and all necessary .so
libraries comes with the conda installation out of the box. We need to figure out how to resolve this contradiction. So either some of the poetry projects need to depend on conda projects, or some of the conda projects need to depend on poetry projects, or some of the dependency management systems should be dropped in favor of another one.
pyproject.toml
project.toml
pyproject.toml
to conda environment.yaml
Possible solution:
poetry
for all projects except for dedup
application itself (pipeline)conda-develop
command to install non-conda packages when working with the dedup
app. Rationale:
We already do similar thing when we place db
package at the repository root.
Links:
Problem
Currently the repository contains multiple applications with some shared logic and but different dependencies in general:
repo-admin
cli tooljust
cli toolAPI server and
repo-admin
requires some of the dependencies fromwinnow
, but not all of them. Some of the reusable parts are extracted into the packages that are placed at the repository root (e.g.task_queue
,db
). Also there are a lot of files that are related to deduplication app at the root, but not to the rest of the applications.Problems:
repo-admin
needs to be tiny, PyPI-distributable and independent fromwinnow
, but it needs some logic fromjust
which depends onwinnow
. As a results some of the logic fromjust
is duplicated inrepo-admin
.As a result our monorepo gets disorganized and as we add more complexity the above problems will get worse.
Goals
Improve monorepo organization so that:
Possible solution:
We can consider an approach described in https://medium.com/opendoor-labs/our-python-monorepo-d34028f2b6fa A working example could be found here https://github.com/ya-mori/python-monorepo
The difficult part is that ML stuff uses
conda
dependency manager.