Plant Hierarchy Implementation Proposal

temmey commented 1 year ago

Here's a proposal to discuss how we can build our plant hierarchy:

Tables: Family, Genus, Species, Variety, Cultivar

Please refer to the guide outlined in Hierarchy. Every table should have most of the properties we already use for plants. This means we still have info even if a child table doesn't have a value. We might want to add certain columns to tables based on what makes biological sense.

Inheritance isn't going to work here. For more details, check out Decision Hierarchy. Also, Rust Diesel isn't made for this.
Changing the "plants" Table

We can change "plants" table to "PlantDetails" or something similar. We'll add nullable columns for Variety FK and Species FK.
Creating a "Plants" View

We'll make a "Plants" view that brings all tables together. This will help us use them in the backend without big changes.
Handling Null Values

This view will deal with null values in the "plants" table. For example, if plants.heat_zone = null, it'll try to get the value from its cultivar. If that's also null, it'll look in the variety, and so on.
Functions: Update, Insert, Delete
- Update: Only updates a null value in the plants table if it's different from its parent table value. If the parent table value is null, it'll check the one above. If the hierarchy changes, it'll keep inserting the hierarchy tree until it finds an already existing hierarchy.
- Insert: Only adds values to columns if they're different from their parent table, otherwise adds null. If parent doesn't exist, also add an entry to parent table like a new Species and insert all values there.
- Delete: Removes the entry from the plants table.
Creating Triggers

We'll make triggers with the INSTEAD OF keyword to replace the UPDATE, INSERT, and DELETE events from the new "Plants" view.

Problems Solved

[ ] Backend changes are too big: The "plants" view should help keep changes small.
[ ] When we change values in a parent table like genus, all children should change, unless a child has its own value. We should be able to solve this by saving only values that are different from their parent.
[ ] Hybrid plants: We treat them as their own species as mentioned in hierarchy.md. For example, species.name: Iris × germanica, plants.unique_name: Iris × germanica. This should be possible with this solution.
[ ] Plants without a genus/varieties, etc.: FK Ids to parents can be nullable, so not all plants need a full hierarchy.

Problems We Still Need to Solve

[ ] Dealing with a new plant with a different species but the same genus that we haven't yet added to our database. The old species A doesn't have a link to species B.
[ ] Special cases of hybrid plants, like a mix between different genus. It's possible, like with Shipova. Should we think about such cases? They seem to be rare.

markus2330 commented 1 year ago

Thank you for collecting your thoughts here! The next step is to update our decisions/database guidelines etc. that your new approach is explained and the old discarded approaches get removed (or are only a considered alternative but it is clear that we won't implement it). Especially the old SQL files should be cleaned up.

Inheritance isn't going to work here. For more details, check out Decision Hierarchy. Also, Rust Diesel isn't made for this.

Yes, please add this further insights to the decision. It is crystal clear now that we won't use inheritance.

We can change "plants" table to "PlantDetails" or something similar. We'll add nullable columns for Variety FK and Species FK.

What benefit would the change to PlantDetails bring? Adding nullable columns should work without any problems. Basically we only need FK for the hierarchy, won't we?

We'll make a "Plants" view that brings all tables together. This will help us use them in the backend without big changes.

I don't understand what you mean by "all tables". The last thoughts were that we only have one table. This is already reflected in doc/decisions/database_plant_hierarchy.md

For example, if plants.heat_zone = null, it'll try to get the value from its cultivar. If that's also null, it'll look in the variety, and so on.

Exactly.

Dealing with a new plant with a different species but the same genus that we haven't yet added to our database.

I am not sure if I understand this problem, please explain in more detail. But what is clear: order of inserting matters, the higher levels need to be inserted first. Otherwise the deduplication cannot work.

Special cases of hybrid plants, like a mix between different genus. It's possible, like with Shipova. Should we think about such cases? They seem to be rare.

They are not so rare but having special implementation for these cases is nevertheless not worth the effort. You cannot really derive something from the knowledge that a plant is both apple and pear. So Shipova would be simply e.g. an Apple (we choose one), and we specify the differences.

temmey commented 1 year ago

Thank you for the feedback.

Thank you for collecting your thoughts here! The next step is to update our decisions/database guidelines etc. that your new approach is explained and the old discarded approaches get removed (or are only a considered alternative but it is clear that we won't implement it). Especially the old SQL files should be cleaned up.

will do that in #654.

We can change "plants" table to "PlantDetails" or something similar. We'll add nullable columns for Variety FK and Species FK.

What benefit would the change to PlantDetails bring? Adding nullable columns should work without any problems. Basically we only need FK for the hierarchy, won't we?

In my proposal, the goal would be to use a view instead of the table directly, since I want to join all relevant information (like genus, species, etc., and null values in the plants replaced with corresponding values from parent tables) together to keep the changes in the backend and frontend to a minimum.

We'll make a "Plants" view that brings all tables together. This will help us use them in the backend without big changes.

I don't understand what you mean by "all tables". The last thoughts were that we only have one table. This is already reflected in doc/decisions/database_plant_hierarchy.md

Like described above, joining the tables plants, Family, Genus, Species, Variety, Cultivar into a single view will allow us to keep the changes in the backend and frontend to a minimum, while extending the available information.

markus2330 commented 1 year ago

will do that in https://github.com/ElektraInitiative/PermaplanT/issues/654.

Better to start a new PR on top of that one (or based on master if it is independent) so that we don't confuse status quo and what should be done in order to implement the hierarchy.

I would like to merge #654 asap but I am afraid we won't be done quickly with the discussion about how to do the hierarchy.

tables [...] Family, Genus, Species, Variety, Cultivar

When we had a discussion about this last time, we found that these tables are not a good idea, so we decided to have everything in plants.

But your proposal does sound interesting, can you update the decision with the advantages compared to simply have a plants table with FK to their Family, Genus, Species, Variety, Cultivar in the same table?

The big disadvantage of everything in one table is that we need to duplicate all columns in all tables (Family, Genus, Species, Variety, Cultivar.). (Yes all columns can be relevant in all hierarchy levels.)

temmey commented 1 year ago

tables [...] Family, Genus, Species, Variety, Cultivar

When we had a discussion about this last time, we found that these tables are not a good idea, so we decided to have everything in plants.

But your proposal does sound interesting, can you update the decision with the advantages compared to simply have a plants table with FK to their Family, Genus, Species, Variety, Cultivar in the same table?

The big disadvantage of everything in one table is that we need to duplicate all columns in all tables (Family, Genus, Species, Variety, Cultivar.). (Yes all columns can be relevant in all hierarchy levels.)

Not only do we need to duplicate all columns in all tables, we also can't differentiate between properties of a Family, Genus, Species, Variety, Cultivar, and a property of a specific plant. This makes it impossible to update the values of a Family, Genus, Species, Variety, Cultivar without affecting a plant we don't want to affect.

What exactly do you mean with: (Yes all columns can be relevant in all hierarchy levels.) ?

I can update the decision doc, should I also start the implementation, or do we want to discuss some more?

markus2330 commented 1 year ago

Not only do we need to duplicate all columns in all tables, we also can't differentiate between properties of a Family, Genus, Species, Variety, Cultivar, and a property of a specific plant. This makes it impossible to update the values of a Family, Genus, Species, Variety, Cultivar without affecting a plant we don't want to affect.

Can you bring an example of what you mean? Updates are only safe for NULL values anyway?

What exactly do you mean with: (Yes all columns can be relevant in all hierarchy levels.) ?

That all the columns we currently have in the plants table (soil, pH, shade, size, ...) might be changed in every hierarchy level (Family, Genus, Species, Variety, Cultivar).

I can update the decision doc, should I also start the implementation, or do we want to discuss some more

Please update the decision doc first and create issues describing what you want to do. I am not sure if we are on the same page yet.

temmey commented 1 year ago

I have created #793 as a draft for now so that we can discuss my idea. I hope it helps to convey my idea.

ElektraInitiative / PermaplanT

Plant Hierarchy Implementation Proposal #764

Problems Solved

Problems We Still Need to Solve