ai-cfia / ailab-datastore

This is a repo representing the data layer of multiple ailab projects
MIT License
2 stars 0 forks source link

As a dev, I want `register_analysis` and `delete_inspection` to handle container management internally #174

Open k-allagbe opened 1 month ago

k-allagbe commented 1 month ago

Description

Context
In the datastore's new_user function, the logic for associating a blob container with a user is internally managed. This logic is hidden from the user. However, in the register_analysis and delete_inspection functions, users are still expected to create their own blob containers and pass them as arguments. This doesn't make sense because the logic behind container creation and association is already managed internally in new_user.

Problem Statement
The responsibility of managing blob containers should be placed on the register_analysis and delete_inspection functions themselves, as the logic behind container creation is not exposed to the user. There is no reason to ask the user to pass a container when they don't know the underlying logic. These functions should be refactored to handle the container logic internally and only require a user_id and a connection string to ensure consistent behavior.

Acceptance Criteria

Francois-Werbrouck commented 1 month ago

Before addressing the whole thing, let me define 2 terms

  1. What we call a User in the Datastore is a client of our application/services.
  2. For the rest of this thread, lets refer to any user of our library as the BackEnd (BE), which is usually the entity using our library for the specific project of our application/services.

First of all, I will add some context behind the decision the current implementation.

Let me address the creation of a container in new_user().

The way we define a User in the Datastore is a client of our application/services. Therefore, when a new user is registered, the user needs to have a space to upload the picture he needs. We then configured the function to automatically create a container unique for the new_user. It would be pointless to ask the BE to call new_user() and another function that would require only the user_id to create the user's container. It would also introduce the risk of user's not having their personal container.

Now, we currently are at the beginning of prototyping a tool that should (hopefully) be used across the agency. Which mean, alot of management and hierarchical decision. Therefore, we previously inferred it would be beneficial for the Datastore to eventually implement a group_user. Those group could have their own container accessible for all the user's of said group. By having this implementation in place, we cannot assume the upload of a file to always be under a user's specific container. We then placed the decision of which container to upload the content to upon the BE. The BE is also the layer having to deal with the business logic. Which in all our project is managing which user does what on which resources.


Now let me address the "problems" of this issue.

The responsibility of managing blob containers should be placed on the register_analysis and delete_inspection functions themselves,

If we, as the Datastore, assume which container the upload should be done upon, could will to less flexibility for the tool. If we someday implement a group structure or any other structure where the upload is not directly tied to a user's container with assuming the container in the upload will require alot of rework/ work around.

as the logic behind container creation is not exposed to the user [Backend].

The BE doesn't need to know the logic behind the creation of a container. It should only need to know if a container is created/exists and how to get it's container_client. The BE only needs to provide a container_client that exists within our AzureBlobStorage. All container created by the BE should also be created through our library, we have implemented a structure to manage the container entities in the Database. We also provide all the function necessary for the BE to execute tasks on container without exposing the BE to any complexity of the containers.

These functions should be refactored to handle the container logic internally and only require a user_id and a connection string to ensure consistent behavior.

The BE should be the one to manage all the connection within the tool. If we create the connection to the azure entity ourselves, it leads to the BE not having full control of the tool.

On the same thought of all that was raised before, making the assumption that the upload will ALWAYS be in the container of the user's performing the requests on the FE of one of our projects is really far fetch and will limit the future usage of our applications.


Here is a document of a use case of why not specifying the container can reveal being important : Nachet-manage-folder.md

Here are the issues about creating the group structure and the container structure in the DB to extend the flexibility of our tool:

Here is a quick schema explaining how a user could be affiliated to multiple containers, therefore, requiring the BE to decide which container to upload into:

erDiagram
  user{
    uuid id PK
    text email
  }
  group{
    uuid id PK
    text name
    uuid owner_id
  }
  container{
    uuid id
    text name
    uuid created_by
 }
user }o--o{ group: apart
  container||--|| user: uses
  container||--|| group: uses

A direct use of fertiscan could be each team of inspector could be apart of a group. Some people of said group could take the pictures upload them, but others uses the picture to validate the inspection.

There is also the case when an inspection is digitalized we could save the picture into a Dev or public container for future dev testing and or make it available for everyone in the search engine of the tool.

k-allagbe commented 1 month ago

If a user can save stuff in a container that is not his, then indeed the responsibility of creating containers should not be the datastore's, because how does the datastore then assesses where to save stuff. That is container logic we do not want to handle in the datastore. So we pass on that responsibility to the BE, or even maybe some other package / module.

But that also means that nowhere in the datastore we should create containers on the behalf of the BE. For instance, you mention creating a personal container for each user because we don't want to risk them not having one. Well, that's already taking some of that responsibility. We even handle the container naming.

Imagine now that the BE, decides not to go with that and handles containers differently (different naming, grouping, maybe even put everything in the same container, ...) because it has the right to. That renders the datastore container creation logic useless. Even if the BE decides to use the datastore's container logic, then a part of the logic is on the datastore and the rest on the BE. Hard to maintain.

Thoughts?

Francois-Werbrouck commented 1 month ago

After my recent work on #137 I have a refreshed and global understanding of how the Datastore handles and manages uploading content into our Azure Storage services.


Allowing the BE of a project to specify the Container client is a core principle of being flexible and is not an issue in my opinion;

The Datastore as yet put in the resources needed for the BE to have access to High-Level Function to manage the grouping or naming for all Business requirements (Container, folder, images). However, at the current state of our projects, some requirements are not deemed critical so we focus on improving our services elsewhere.

In Nachet there is a dev_Container which is used to save data that a user of the application has verified. This data can be used for training purposes and/or could be fetched by other user if the need arise to implement such functionality.

Allowing the BE to specify the Container_client, and not limiting us into ALWAYS uploading into the user's container is important to keep flexibility across our project for future needs and future feature implementation


The Datastore currenty allows the BE to maintain the logic within the Container:

Not allowing the Fertiscan BE to have the liberty to specify the folder into which the upload should be performed is the core problem of this issue in my opinion (at least for register_analysis)

I suggest allowing the Fertiscan BE to specify a folder into which they want the upload to be performed, or at least an option to name the folder.


Now let me answer to the previous comment;

If a user can save stuff in a container that is not his, then indeed the responsibility of creating containers should not be the datastore's, [...] That is container logic we do not want to handle in the datastore.

This is not true. The datastore currently have functions that takes the responsibility of creating containers. The architecture is just not yet present to have High-Level function for the BE of our application to currently manage Containers (#34 & #33). Also, at the current state of our application, it does not seem like the need was really present to do so.

So we pass on that responsibility to the BE, or even maybe some other package / module.

I do not know where you get that impression, however, the BE never need to directly create container, nor do we shift that responsibility elsewhere?

But that also means that nowhere in the datastore we should create containers on the behalf of the BE. For instance, you mention creating a personal container for each user because we don't want to risk them not having one.

We could be super rigid instead of being practical about it and ask the BE to create a user in the db and create a user in the blob_storage. This would also expose the BE to the complexity of our Data Architecture by asking the BE to perform all the intricate operation behind creating a user. The BE only wants to create a user, a group or wants to create a transaction. The Datastore handles the complexity of those object for the BE. The BE then points (with ids) towards the right entity.

We even handle the container naming.

Currently we do for the Container, we haven't implemented those in the DB Architecture yet so we are imposing the patterns at the moment. However, folder names (paths) are customizable to a certain extent.

Imagine now that the BE, decides not to go with that and handles containers differently (different naming, grouping, maybe even put everything in the same container, ...)

I mean, why would they do that and go against what we offer?

Lets say they do, grouping is planned to be registered in the Datastore, naming is partially offered by the Datastore, We allow in Nachet to put everything in the same Container. This was also previously mentioned when planning the validation of an inspection for Fertiscan. I don't see why the BE would do things their own way

That renders the datastore container creation logic useless.

The Datastore offers way for the BE of different project to perform tasks by using recurrent functionality (creating a folder in a Azure Blob storage, uploading a blob, creating a ContainerClient, etc.) within the global Datastore and making custom features for the specific tasks of the project in the project's Datastore. I don't see why imposing a logic and feeding the object to the BE once requested useless.

Even if the BE decides to use the datastore's container logic, then a part of the logic is on the datastore and the rest on the BE. Hard to maintain.

I do not see where part of the logic is on the BE. The BE currently is "restricted" of uploading only in the container client of the user because otherwise, the BE would need to magically know which container the upload was performed upon. It is an agreed restriction across layers at the moment since we do not handle a more diverse Container logic.


Focusing back on this issue, I don't think it is a good idea to remove the container client as a parameter since it will drastically reduce our flexibility, for all the reasons stated previously. I think there could be a way to not request it into delete statement once we implement container tracking in our DB Architecture. We should focus extending the scope of our offered functionality, in a well thought approach, instead of backing us up into a corner by reducing our flexibility