Closed hanjoosten closed 6 months ago
My 2cts: To make sure that the order of stuff doesn't effect the generated column name, the generated name may only rely on the name of the relation itself, and knowledge of its source and target concepts. As a consequence, it isn't known if the name is overloaded or not. So in order to be safe, we have to assume it is overloaded. This means we have to use a hash based on the name AND the signature of the relation. In most cases, it is nice that a human can see, based on the column name, what relation the column implements.
Based on the above, I propose to generate the column name to be the relation name concatenated with the hash described above. This might become too long, depending on the length of the name. In that case, we have to shorten the name, but keep the hash intact. @Michiel-s , what do you think?
We discussed the proposal from @hanjoosten today.
Like with git commits, let's use SHA-1 hashes and by default use only the first 7 chars. Read this nice article and the table in this issue to read that the probability of hash collisions is <1% up until we overload the name 2323x. That's safe enough. If we would have a collisions, the database structure will be invalid and the user is notified anyway. So let's just SHA-1 hash and take the first 7 chars.
Given the script
CONCEPT My.Namespace.A ""
CONCEPT My.Namespace.B ""
RELATION My.Namespace.r[My.Namespace.A*My.Namespace.B]
RELATION My.Namespace.s[My.Namespace.A] [UNI]
CONCEPT My.Other.A ""
RELATION My.Other.r[My.Other.A*My.Namespace.A] [UNI]
Full name | Local name | Overloaded? | Short hash (first 7 chars) | Hash |
---|---|---|---|---|
My.Namespace.A | A | yes | d0e5adc | d0e5adcf59ddbdf9155b0c7f8f1b4fdc064e084c |
My.Namespace.B | B | no | 578434f | 578434f79bb702857ecbef1176e2fae476b36589 |
My.Other.A | A | yes | f4ae41a | f4ae41ac9c51aa05523617903c43ac0b37bc17fa |
My.Namespace.r[My.Namespace.A*My.Namespace.B] | r | yes | 98539d6 | 98539d61fbdf0bd7d67a415ffabf5b9ec61e839c |
My.Namespace.s[My.Namespace.A*My.Namespace.A] | s | no | a608f3f | a608f3f4968c3f45d59c52673e69e8a4c0540298 |
My.Other.r[My.Other.A*My.Namespace.A] | r | yes | 7443ce8 | 7443ce81cb9c7401e8ce6c637dc02bbccf0d7f08 |
This results in tables like:
Table A_d0e5adc |
A_d0e5adc | s |
---|
Table B |
B |
---|
Table A_f4ae41a |
A_f4ae41a | r_7443ce8 |
---|
Table r_98539d6 |
A_d0e5adc | B |
---|
Explicitly look at the situation of relation name overloading.
For column names, replace namespace dots by underscores