Closed k-allagbe closed 2 weeks ago
Object in the Fertiscan transaction insertion workflow:
I will address each objects. However, please note there used to be ALOT of Not Null constraint in the DB and all of them (except a special few) had to be removed, otherwise there could be edge case creating exceptions.
All timestamp are updated and set via triggers to assure an automatic and reliable value.
erDiagram
inspection {
uuid id PK
boolean verified "default=False"
timestamp upload_date "default = now()"
timestamp updated_at "default = now()"
uuid inspector_id FK "NOT NULL"
uuid label_info_id FK
uuid fertilizer_id FK
uuid sample_id FK
uuid picture_set_id FK
}
We could make the picture_set_id and label_info_id NOT NULL. This would help the referencing but would be restraining us in some edge cases
erDiagram
label_information {
uuid id PK
string lot_number
string npk
string registration_number
float n
float p
float k
string guaranteed_title_en
string guaranteed_title_fr
boolean title_is_minimal
uuid company_info_id FK
uuid manufacturer_info_id FK
}
Everything can be Null and could be at once. The pipeline could pick every data for the Guaranteed_analysis, sub_labels, but all the data in the label_information table could be absent. There needs to be a label_information entity in the DB since it is the parent table of all the other information that needed specialization. This is a real edge case. Making something like even the name not null, would not take account of all the possibilities of our pipeline.
erDiagram
organization_information{
uuid id PK
string name
string website
string phone_number
uuid location_id FK
boolean edited
}
location {
uuid id PK
string name
string address "NOT NULL"
uuid region_id FK
uuid owner_id FK
}
Everything can be null but we are making sure not at once. There are no needs to have an empty organization stored in the DB.
If the given location address is null, we do not create an instance of it in the DB.
Type of checking: If statement in the workflow and insert function and checked individually for the location address later on. No triggers involved
erDiagram
metric{
uuid id PK
float value
boolean edited "default = false"
ENUM metric_type
uuid unit_id FK
uuid label_id FK
}
Only either the value or the unit can be null, not both at once. If that is the case, it means the metric is empty of actual data from the Pipeline
Type of checking : If statement in the insertion/ update workflow it skips the value if all is null, it is also checks in the individual metric function if all values are null and raise an exception.
Checking in the workflows and the individual function assures no empty metrics will be created
We could be rigorous and make the metric_type NOT NULL, but it is attributed by the Datastore manually on insertion so the need is very low.
erDiagram
guaranteed{
uuid id PK
string read_name
float value
string unit
boolean edited
int element_id FK
uuid label_id FK
}
The GA is received in a list. We iterate over the list, if no GA have been reported by the pipeline, none are inserted. However, if there is an entry in the list, the name, value and unit can be empty, but not all at once. There needs to be at least one of the parameter with a value to be inserted or else we would have an empty record in the DB.
Type of checking: If statement, in the insertion/ update workflow it skips the value if all is null, there is also a check in the individual metric function if all values are null which then raise an exception.
Checking in the workflows and the individual function assures no empty GA rows will be created
Sub label is the only tricky part of the objects since it represent multiple array of text (english & french) all classified under a sub_label_type.
erDiagram
sub_label {
uuid id PK
text text_content_fr "NOT NULL DEFAULT ='' "
text text_content_en "NOT NULL DEFAULT='' "
boolean edited "default = false"
uuid label_id FK "NOT NULL"
uuid sub_type_id FK "NOT NULL"
}
For each type in the form, we iterate over both of the array and insert the french and english text value if there is one. If the array length is different for the 2 languages, some records SHOULD only have one of the text_content with data since they are inserting a NULL value.
A Null value in the text_content was/would creating/create the FE problems displaying the arrays and for the Datastore maintaining the logic tying the English and French content.
An issue where @k-allagbe worked on implemented:
-- DROP FUNCTION "fertiscan_0.0.15".check_null_or_empty_sub_label();
CREATE OR REPLACE FUNCTION "fertiscan_0.0.15".check_null_or_empty_sub_label()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
-- Check if both text_content_fr and text_content_en are NULL or empty
IF (NEW.text_content_fr IS NULL OR NEW.text_content_fr = '') AND
(NEW.text_content_en IS NULL OR NEW.text_content_en = '') THEN
-- Raise an exception and skip the insertion
RAISE EXCEPTION 'Skipping insertion because both text_content_fr and text_content_en are NULL or empty';
RETURN NULL;
END IF;
-- Replace NULL with empty strings
IF NEW.text_content_fr IS NULL THEN
NEW.text_content_fr := '';
END IF;
IF NEW.text_content_en IS NULL THEN
NEW.text_content_en := '';
END IF;
-- Allow the insert if at least one is not NULL or empty
RETURN NEW;
END;
$function$
;
Is the only instance where we check with a trigger for data. Since it seemed like a special case (there should always be a french and english version on an entry) I did not see problem with it. It also allow us to maintain the logic of not inserting nulls in our loops for the insertion/update workflow.
If we want to migrate the usage of this trigger directly into insert_sub_label()
it would facilitate the insertion. We'd have to tweak the update a little bit on the other hand (maybe the insert too for unicity).
Now let me state there are no triggers checking for the data before insertion. I've always been against using such triggers like stated, it adds complexity in flow of the data and could make maintenance more difficult.
I do not know where the impression of checking with triggers comes from since we only have one special case of such usage.
Implement a database constraint ensuring at least one non-null field (besides ID) for relevant entities.
It is possible to put a constraint on a table to limit a combination of row to all be nulls. However, we'd still check in most cases if all the values are null anyway since the Pipeline can send all Null values in some cases. Adding the constraint would only add a layer of protection. I strongly advise not to remove the if statement since it would create Exceptions left and right.
I see the issue of the if statement not covering the case where a user perform a manual query to insert on the tables, but such case is not happening in all of our work. We use functions to not write redundant code and to make sure the data inserted is correctly formatted. However, adding a table constraint would prevent such case so I'm not against adding constraint and seeing if our tests are still passing.
- Remove existing checks and triggers that handle this behavior.
- Ensure uniform enforcement across all entities that require this validation.
Both of these are already the case, if we ignore the Sub_label trigger. If needed, it will be adressed.
Description
Context Currently, in our database, most of our entities require at least one field (other than the ID) to be non-null during insertion. We have implemented this behavior through a mix of checks in functions for some entities and database triggers for others. This scattered approach leads to inconsistency and unnecessary complexity across the codebase.
Problem Statement Relying on functions and triggers to enforce non-null checks increases maintenance overhead and creates potential gaps in validation, which could allow unwanted records with all null fields. Implementing a database constraint for this behavior ensures uniform validation directly at the schema level, reducing the chance of errors and simplifying code management. A constraint also provides better performance as it avoids the overhead of trigger execution.
Acceptance Criteria