ai-cfia / nachet-backend

A flask-based backend for Nachet to handle Azure endpoint and Azure storage API requests from the frontend.
MIT License
1 stars 4 forks source link

Use versioning of datastore package to enable automated builds of nachet for new versions #109

Open SonOfLope opened 3 months ago

SonOfLope commented 3 months ago

Context

Currently, the requirements.txt points to the main branch of the package nachet-datastore :

nachet-datastore @git+https://github.com/ai-cfia/nachet-datastore.git@main

There should be versioning on the datastore package that we can use so that our automated bot Renovate can track and propose pull request to update version. This will enable a new deployment package of nachet with dependencies up to date without having to do a build manually.

Proposed solution

Renovate will then be able to automate pull request upon new version releases. That way we can simply merge the pr to have a build and push of a new nachet-backend image with latest changes of nachet-datastore

SonOfLope commented 2 months ago

relates to https://github.com/orgs/ai-cfia/projects/7/views/1?pane=issue&itemId=57886391

SonOfLope commented 2 months ago

I'm trying to think of the best way to handle this and I think the folder structure would need to be revamped.

I was thinking of having something like the following :

ailab-datastore
├── datastore
│   ├── pyproject.toml
│   ├── requirements.txt
│   ├── __init__.py
│   ├── blob
│   ├── bin
│   ├── tests  # Tests for shared components
│   └── ...
├── nachet
│   ├── pyproject.toml
│   ├── requirements.txt
│   ├── __init__.py
│   ├── tests
│   ├── README.md
│   └── ...
├── fertiscan
│   ├── pyproject.toml
│   ├── requirements.txt
│   ├── __init__.py
│   ├── tests  
│   ├── README.md 
│   └── ...
├── README.md

It would be much easier to build separate packages for fertiscan and nachet to which we can point to in respective repositories. Each folder has its own dependencies declaration and tests.

respective pyprojects file would look like :

[project] name = "nachet-datastore" version = "0.1.0" authors = [ { name = "Francois Werbrouck", email = "francois.werbrouck@inspection.gc.ca" }, { name = "Sylvanie You", email = "Sylvanie.You@inspection.gc.ca" } ] description = "Data management python layer for Nachet" readme = "README.md" requires-python = ">=3.11" classifiers = [ "Programming Language :: Python :: 3", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", ] dependencies = [ { path = "../shared" }
]

license = { file = "../LICENSE" }

keywords = ["Nachet", "ailab"]

[tool.setuptools] packages = [ "nachet" ]

[project.urls] "Homepage" = "https://github.com/ai-cfia/ailab-datastore/nachet" "Bug Tracker" = "https://github.com/ai-cfia/ailab-datastore/issues" "Repository" = "https://github.com/ai-cfia/ailab-datastore"


Then versioning could be done automatically through release packages (through a pipeline) with tags like `v0.1.0-nachet-datastore` and `v0.1.0-fertiscan`. We could then use it in nachet (for example) as : 
```txt
nachet-datastore @ git+https://github.com/ai-cfia/ailab-datastore.git@v0.1.0-nachet-datastore

This would also enable our Renovate bot to automatically open PR to update the datastore version in respective repositories.

@Francois-Werbrouck @k-allagbe is something like this achievable ? Would it actually help you guys?

k-allagbe commented 2 months ago

I've not worked with pyproject.toml before. I would have to test this. Separating the packages by folder sounds great to me. I would even go as far as separating the packages in their own repositories.

SonOfLope commented 2 months ago

@k-allagbe the decision to have everything in the same repository as a monorepo was taken to allow reuse of code (datastore folder) that can be used for both nachet and fertiscan without having to duplicate the code in two separate repositories.

k-allagbe commented 2 months ago

@k-allagbe the decision to have everything in the same repository as a monorepo was taken to allow reuse of code (datastore folder) that can be used for both nachet and fertiscan without having to duplicate the code in two separate repositories.

Precisely. If datastore folder is its own package (and repo), it can be used as a dependency in the others. It adds a bit of complexity but in return, each package gets its own version and cicd pipeline.

SonOfLope commented 2 months ago

So we would have 3 repositories : Datastore --> shared components nachet-datastore --> points to datastore and has its own versioning that we can point to (e.g. : nachet-datastore @ git+https://github.com/ai-cfia/nachet-datastore.git@v0.1.0 fertiscan-datastore --> same as nachet-datastore

The only problem i see with this is managing dependencies across all repositories. having everything in the same repo makes it easier since you dont have to point to a specific version.

Lets say we make changes to datastore which pushes a new version v.1.0.1 --> v.1.0.2. Then we would need to update nachet-datastore to point to the new version per requirements.txt. This makes a new version of nachet-datastore. This new version then needs to be updated in Nachet to point to the new version of nachet-datastore.

If we have a monorepo, we can perform atomic updates of the entire monorepo in a single PR instead of cascading PRs.

k-allagbe commented 2 months ago

Yes. I'm fine with cascading version update PRs, especially if managed with renovate. But as discussed, a well implemented monorepo might be best if we can avoid all that.

Francois-Werbrouck commented 2 months ago

We'd have to test it out, but I think this will help alot

Francois-Werbrouck commented 2 months ago

I think to achieve this we will need to rework the datastore to look like the following:

ailab-datastore
├── datastore
│   ├── pyproject.toml
│   ├── requirements.txt
│   ├── __init__.py
│   ├── blob
│   ├── db
|   |    |──queries
|   |    |    └──picture
│   ├── tests  # Tests for shared components
│   └── ...
├── nachet
│   ├── pyproject.toml
│   ├── requirements.txt
│   ├── __init__.py
│   ├── tests
│   ├── db
|   |    |──queries
|   |    |    └──inference
│   ├── README.md
│   └── ...
├── fertiscan
│   ├── pyproject.toml
│   ├── requirements.txt
│   ├── __init__.py
│   ├── tests  
│   ├── db
|   |    |──queries
|   |    |    └──inspection
│   ├── README.md 
│   └── ...
├── README.md

We will need to: