ai-cfia / fertiscan-backend

Fertiscan backend
MIT License
2 stars 0 forks source link

Migrate Analysis Pipeline to A New Repository #94

Closed snakedye closed 1 month ago

snakedye commented 1 month ago

Description: The entire analysis pipeline from the fertiscan-backend repository will be migrated into a new repository, fertiscan-pipeline. This migration aims to modularize the project, making the pipeline code reusable and maintainable as a standalone package.

This will reduce costs as the GPT related code will not be executed in the workflow if it is not modified.

Tasks:

Acceptance Criteria:

Endlessflow commented 1 month ago

Correct me if I am wrong, the reason we want to create a new repo is to ensure that when we make changes to our image processing pipeline, we can run adequate performance testing to determine if the modifications are better or worse than the current pipeline. Since these tests are resource-intensive and we want to make it clear when a modification is made to the processing pipeline, we decided it would be beneficial to turn that part of the code into its own independent module.

With this said, I wonder if the scope of the new repo is just GPT or the whole processing pipeline. Especially as we flesh it out in the futur with more components to get better results.

k-allagbe commented 1 month ago

Correct me if I am wrong, the reason we want to create a new repo is to ensure that when we make changes to our image processing pipeline, we can run adequate performance testing to determine if the modifications are better or worse than the current pipeline. Since these tests are resource-intensive and we want to make it clear when a modification is made to the processing pipeline, we decided it would be beneficial to turn that part of the code into its own independent module.

With this said, I wonder if the scope of the new repo is just GPT or the whole processing pipeline. Especially as we flesh it out in the futur with more components to get better results.

That's correct.

Although, it is common practice to separate services in their own package for re-usability. The end goal, which is to avoid testing the heavy processes at every small change in the backend, will still be achieved: the OCR and the GPT packages will be tested in their own repository, separately, with input and output data carefully selected to be representative of the checkpoints at which each is invoked in the backend. Basically the same thing, but split in two for the benefit of modularity.

Edit: It is still possible to have both in the same repository and implement a single pipeline like test on both.

k-allagbe commented 1 month ago

After discussion with @Endlessflow. Testing the whole processing pipeline makes more sense than each component individually. Please, @snakedye consider exporting both OCR and GPT in the same repository.

snakedye commented 1 month ago

@Endlessflow @k-allagbe Might as well move everything in ./backend in the new repo because how you build the document from the raw images is also part of the pipeline. It will have an effect on the end result. All that will be left is the Flask router.

If we go with that approach is there a substantial gain from what we have now?

Now most of the code is related to the pipeline and the flask router is only moving because we are still working on the API. Once that's set it won't change much and most of the changes will affect the pipeline anyway.

k-allagbe commented 1 month ago

@Endlessflow @k-allagbe Might as well move everything in ./backend in the new repo because how you build the document from the raw images is also part of the pipeline. It will have an effect on the end result. All that will be left is the Flask router.

If we go with that approach is there a substantial gain from what we have now?

Now most of the code is related to the pipeline and the flask router is only moving because we are still working on the API. Once that's set it won't change much and most of the changes will affect the pipeline anyway.

I'm comfortable with having the whole analysis pipeline separated into the new repository (which I suggest naming fertiscan-ai-pipeline or fertiscan-analysis-pipeline). The backend also has the responsibility of communicating with the db and any frontend client, which may or may not be subject to rapid changes. In any case, this is a good practice.

Endlessflow commented 1 month ago

For transparency, we reached a consensus on the idea of separating the entire analysis pipeline into a new repository.