Open SonOfLope opened 3 months ago
I'm trying to think of the best way to handle this and I think the folder structure would need to be revamped.
I was thinking of having something like the following :
ailab-datastore
├── datastore
│ ├── pyproject.toml
│ ├── requirements.txt
│ ├── __init__.py
│ ├── blob
│ ├── bin
│ ├── tests # Tests for shared components
│ └── ...
├── nachet
│ ├── pyproject.toml
│ ├── requirements.txt
│ ├── __init__.py
│ ├── tests
│ ├── README.md
│ └── ...
├── fertiscan
│ ├── pyproject.toml
│ ├── requirements.txt
│ ├── __init__.py
│ ├── tests
│ ├── README.md
│ └── ...
├── README.md
It would be much easier to build separate packages for fertiscan and nachet to which we can point to in respective repositories. Each folder has its own dependencies declaration and tests.
respective pyprojects file would look like :
[build-system]
requires = ["setuptools>=61.0", "setuptools_scm[toml]>=6.0"]
build-backend = "setuptools.build_meta"
[project]
name = "nachet-datastore"
version = "0.1.0"
authors = [
{ name = "Francois Werbrouck", email = "francois.werbrouck@inspection.gc.ca" },
{ name = "Sylvanie You", email = "Sylvanie.You@inspection.gc.ca" }
]
description = "Data management python layer for Nachet"
readme = "README.md"
requires-python = ">=3.11"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
]
dependencies = [
{ path = "../shared" }
]
license = { file = "../LICENSE" }
keywords = ["Nachet", "ailab"]
[tool.setuptools] packages = [ "nachet" ]
[project.urls] "Homepage" = "https://github.com/ai-cfia/ailab-datastore/nachet" "Bug Tracker" = "https://github.com/ai-cfia/ailab-datastore/issues" "Repository" = "https://github.com/ai-cfia/ailab-datastore"
Then versioning could be done automatically through release packages (through a pipeline) with tags like `v0.1.0-nachet-datastore` and `v0.1.0-fertiscan`. We could then use it in nachet (for example) as :
```txt
nachet-datastore @ git+https://github.com/ai-cfia/ailab-datastore.git@v0.1.0-nachet-datastore
This would also enable our Renovate bot to automatically open PR to update the datastore version in respective repositories.
@Francois-Werbrouck @k-allagbe is something like this achievable ? Would it actually help you guys?
I've not worked with pyproject.toml
before. I would have to test this. Separating the packages by folder sounds great to me. I would even go as far as separating the packages in their own repositories.
@k-allagbe the decision to have everything in the same repository as a monorepo was taken to allow reuse of code (datastore folder) that can be used for both nachet and fertiscan without having to duplicate the code in two separate repositories.
@k-allagbe the decision to have everything in the same repository as a monorepo was taken to allow reuse of code (datastore folder) that can be used for both nachet and fertiscan without having to duplicate the code in two separate repositories.
Precisely. If datastore
folder is its own package (and repo), it can be used as a dependency in the others. It adds a bit of complexity but in return, each package gets its own version and cicd pipeline.
So we would have 3 repositories : Datastore --> shared components nachet-datastore --> points to datastore and has its own versioning that we can point to (e.g. : nachet-datastore @ git+https://github.com/ai-cfia/nachet-datastore.git@v0.1.0 fertiscan-datastore --> same as nachet-datastore
The only problem i see with this is managing dependencies across all repositories. having everything in the same repo makes it easier since you dont have to point to a specific version.
Lets say we make changes to datastore which pushes a new version v.1.0.1 --> v.1.0.2. Then we would need to update nachet-datastore to point to the new version per requirements.txt. This makes a new version of nachet-datastore. This new version then needs to be updated in Nachet to point to the new version of nachet-datastore.
If we have a monorepo, we can perform atomic updates of the entire monorepo in a single PR instead of cascading PRs.
Yes. I'm fine with cascading version update PRs, especially if managed with renovate. But as discussed, a well implemented monorepo might be best if we can avoid all that.
We'd have to test it out, but I think this will help alot
I think to achieve this we will need to rework the datastore to look like the following:
ailab-datastore
├── datastore
│ ├── pyproject.toml
│ ├── requirements.txt
│ ├── __init__.py
│ ├── blob
│ ├── db
| | |──queries
| | | └──picture
│ ├── tests # Tests for shared components
│ └── ...
├── nachet
│ ├── pyproject.toml
│ ├── requirements.txt
│ ├── __init__.py
│ ├── tests
│ ├── db
| | |──queries
| | | └──inference
│ ├── README.md
│ └── ...
├── fertiscan
│ ├── pyproject.toml
│ ├── requirements.txt
│ ├── __init__.py
│ ├── tests
│ ├── db
| | |──queries
| | | └──inspection
│ ├── README.md
│ └── ...
├── README.md
We will need to:
Context
Currently, the requirements.txt points to the main branch of the package nachet-datastore :
There should be versioning on the datastore package that we can use so that our automated bot Renovate can track and propose pull request to update version. This will enable a new deployment package of nachet with dependencies up to date without having to do a build manually.
Proposed solution
Renovate will then be able to automate pull request upon new version releases. That way we can simply merge the pr to have a build and push of a new nachet-backend image with latest changes of nachet-datastore