Clinical-Genomics / trailblazer

Keep track of and manage analyses
MIT License
5 stars 2 forks source link

refactor(foreign key user_id) #419

Open ChrOertlin opened 3 months ago

ChrOertlin commented 3 months ago

Description

refactors a column name in the Delivery database model from delivered_by to user_id.

This is mainly to keep consistency on how we name foreign keys elsewhere and have clearer relationships.

Furthermore, delivered_by is already a property on the Analysis table. Here it returns a user name and not an id. This would be another point of inconcistency.

sonarcloud[bot] commented 3 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

ChrOertlin commented 3 months ago

@henrikstranneheim , @Vince-janv , @seallard , @islean , @diitaz93 can we discuss this topic?

Although there is no set-in-stone way one should name foreign keys across tables and in is dubbed as "holy war" by comments in some threads I read up on this, I think we should strive for consistency here.

Meaning having a foreign key contain the be constructed as: table_column as we have across other databases as well.

Currently the database model looks like this:

Screenshot 2024-03-21 at 16 42 21

Note the link between delivery and user is not immediately clear and breaks pattern.

Vince-janv commented 3 months ago

Interesting, my initial reaction was sceptical but I'm warming up to it. Could you link some of the posts discussing it so we can get a more complete picture and arguments from both sides of the jihad?

islean commented 3 months ago

How should the design be if we want to track both who started and who delivered an analysis? We could not have two columns with the name user_id so I guess the suggestion is more tables? Having some trouble seeing the solution though.

islean commented 3 months ago

How should the design be if we want to track both who started and who delivered an analysis? We could not have two columns with the name user_id so I guess the suggestion is more tables? Having some trouble seeing the solution though.

Would it be having one Delivery table and one Analysis_start table which both link to User and having a user_id column?

ChrOertlin commented 3 months ago

How should the design be if we want to track both who started and who delivered an analysis? We could not have two columns with the name user_id so I guess the suggestion is more tables? Having some trouble seeing the solution though.

it will be a query that uses the three tables. The delivery is linked to a different user than the user in the analysis. So the tracking part has to make sure to extract the right information from the database and give it context.

anyway I think this goes beyond what this intends to discuss: naming consistency/convention of foreign keys in database tables.

ChrOertlin commented 3 months ago

Interesting, my initial reaction was sceptical but I'm warming up to it. Could you link some of the posts discussing it so we can get a more complete picture and arguments from both sides of the jihad?

The following stackexchange I think boils the argument down, see the first accepted answer. There is no "right" or "wrong" here. Key is consistency in what one applies.

  1. Consistent naming of foreign keys with a set pattern
  2. Be consistently inconsistent

I think mixing these is were problems arise.

https://dba.stackexchange.com/questions/105251/is-there-a-standard-for-naming-foreign-keys-when-two-columns-reference-the-same

seallard commented 3 months ago

EDIT: I think it is great that we discuss this and find a good standard 👍

I think it is better that we are explicit in our data models. We want to be clear about what the FK represents. It is common to have multiple foreign keys that point to the same table.

For example, consider that you have a Customer and an Address table. The customer can have a separate shipping and billing address. That is, the customer will have two different FKs pointing to the same table.

from sqlalchemy import Integer, ForeignKey, String, Column
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import relationship

class Customer(Base):
    __tablename__ = "customer"
    id = mapped_column(Integer, primary_key=True)
    name = mapped_column(String)

    billing_address_id = mapped_column(Integer, ForeignKey("address.id"))
    shipping_address_id = mapped_column(Integer, ForeignKey("address.id"))

    billing_address = relationship("Address", foreign_keys=[billing_address_id])
    shipping_address = relationship("Address", foreign_keys=[shipping_address_id])

class Address(Base):
    __tablename__ = "address"
    id = mapped_column(Integer, primary_key=True)
    street = mapped_column(String)
    city = mapped_column(String)
    state = mapped_column(String)
    zip = mapped_column(String)

If we only name FKs based on the table it points to, it limits what kind of data we can model and I think it ends up being less descriptive. I think it is a good thing to be explicit instead of general when modelling data!

See https://docs.sqlalchemy.org/en/20/orm/join_conditions.html#handling-multiple-join-paths.

islean commented 3 months ago

I like the intent in general but I do not think it should be so strictly enforced as to block SA's example... need to think it through a bit more though. First comment here suggests a combination of the two, such as naming it delivered_by_user_id:

https://stackoverflow.com/questions/5647650/database-column-naming-for-foreign-key

henrikstranneheim commented 3 months ago

I also find it more intuitive to name the variable with the intended usage in mind, also motivating the link. It is very easy to see how it gets initialized in the model.

ChrOertlin commented 3 months ago

What sebastian and isak are suggesting intent_table_column is really nice in my opinion.