AmpersandTarski / Ampersand

Build database applications faster than anyone else, and keep your data pollution free as a bonus.
http://ampersandtarski.github.io/
GNU General Public License v3.0
40 stars 8 forks source link

Rethink mapping of relation names to database column names #1446

Closed hanjoosten closed 6 months ago

hanjoosten commented 9 months ago

Explicitly look at the situation of relation name overloading.

For column names, replace namespace dots by underscores

hanjoosten commented 9 months ago

My 2cts: To make sure that the order of stuff doesn't effect the generated column name, the generated name may only rely on the name of the relation itself, and knowledge of its source and target concepts. As a consequence, it isn't known if the name is overloaded or not. So in order to be safe, we have to assume it is overloaded. This means we have to use a hash based on the name AND the signature of the relation. In most cases, it is nice that a human can see, based on the column name, what relation the column implements.

Based on the above, I propose to generate the column name to be the relation name concatenated with the hash described above. This might become too long, depending on the length of the name. In that case, we have to shorten the name, but keep the hash intact. @Michiel-s , what do you think?

Michiel-s commented 6 months ago

We discussed the proposal from @hanjoosten today.

Requirements regarding table and column names:

  1. Short and easy readable for database admins (so e.g. no hash-only names)
  2. Unambiguity of table and column names in case of name overloading (different namespace or different relation signature). This also needs to hold when the order of declarations in the script changes.

Terminology

Solution design

Like with git commits, let's use SHA-1 hashes and by default use only the first 7 chars. Read this nice article and the table in this issue to read that the probability of hash collisions is <1% up until we overload the name 2323x. That's safe enough. If we would have a collisions, the database structure will be invalid and the user is notified anyway. So let's just SHA-1 hash and take the first 7 chars.

Example

Given the script

CONCEPT My.Namespace.A ""
CONCEPT My.Namespace.B ""
RELATION My.Namespace.r[My.Namespace.A*My.Namespace.B]
RELATION My.Namespace.s[My.Namespace.A] [UNI]

CONCEPT My.Other.A ""
RELATION My.Other.r[My.Other.A*My.Namespace.A] [UNI]
Full name Local name Overloaded? Short hash (first 7 chars) Hash
My.Namespace.A A yes d0e5adc d0e5adcf59ddbdf9155b0c7f8f1b4fdc064e084c
My.Namespace.B B no 578434f 578434f79bb702857ecbef1176e2fae476b36589
My.Other.A A yes f4ae41a f4ae41ac9c51aa05523617903c43ac0b37bc17fa
My.Namespace.r[My.Namespace.A*My.Namespace.B] r yes 98539d6 98539d61fbdf0bd7d67a415ffabf5b9ec61e839c
My.Namespace.s[My.Namespace.A*My.Namespace.A] s no a608f3f a608f3f4968c3f45d59c52673e69e8a4c0540298
My.Other.r[My.Other.A*My.Namespace.A] r yes 7443ce8 7443ce81cb9c7401e8ce6c637dc02bbccf0d7f08

This results in tables like:

Table A_d0e5adc A_d0e5adc s
Table B B
Table A_f4ae41a A_f4ae41a r_7443ce8
Table r_98539d6 A_d0e5adc B