Closed christophermichaelthomasmillar closed 1 year ago
Hi, thank you for reaching out to us.
I understood the situation that the inference of attribute types like string-set and number-set can be problematic. It seems to be a similar issue raised in #66. Based on users' feedback, this behavior causes confusion and corner case problems. I believe it would be beneficial to have the option to disable inference for string-set and number-set. Therefore, I would like to consider this as a feature improvement request.
Thanks for getting back to me. We were able to solve our immediate problem by commenting out the code responsible for set type inferencing. Ofcourse I would much rather be using an „official„ build of this tool. I think it would be a very nice improvement. This is a nice tool and id like to keep using it.
Sent from my iPhone
On 6. Jul 2023, at 18:09, Hiroaki Segawa @.***> wrote:
Hi, thank you for reaching out to us.
I understood the situation that the inference of attribute types like string-set and number-set can be problematic. It seems to be a similar issue raised in #66https://github.com/awslabs/dynein/issues/66. Based on users' feedback, this behavior causes confusion and corner case problems. I believe it would be beneficial to have the option to disable inference for string-set and number-set. Therefore, I would like to consider this as a feature improvement request.
— Reply to this email directly, view it on GitHubhttps://github.com/awslabs/dynein/issues/139#issuecomment-1623939872, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLN2TU3U4RDQ5MM27HWMF3XO3PJZANCNFSM6AAAAAAZYJB7TM. You are receiving this because you authored the thread.Message ID: @.***>
When trying to import a backup containing a record with a schema like this:
dynein assumes a schema like this:
This causes a problem when we try to import a record where there are duplicates (allowed by our schema) such as this:
"lines":["Theostraße 42-90","Hamburg","Hamburg"],
This is caused by an unreasonable assumption in dynein/src/data.rs line 831
any list of strings is a String Set "SS" and any list of numbers is a Number Set "NS"
To be correct there should be a check that there are no duplicates and if there are duplicates it should be a list "L"
However this would still not be enough for us. In our schema we have no uses of String Set "SS" or Number Set "NS" anywhere and we do not like the idea of the schema changing from one record to another.
A solution would be to add a cli option to the import function which disables string sets and one that disables number sets.
Thanks