OHDSI / WhiteRabbit

WhiteRabbit is a small application that can be used to analyse the structure and contents of a database as preparation for designing an ETL. It comes with RabbitInAHat, an application for interactive design of an ETL to the OMOP Common Data Model with the help of the the scan report generated by White Rabbit.
http://ohdsi.github.io/WhiteRabbit
Apache License 2.0
174 stars 85 forks source link

Adding index where table name length above 31 #305

Open kesadae11 opened 3 years ago

kesadae11 commented 3 years ago

Hello!

I faced an uncomfortable thing in WhiteRabbit. If I have a table which name length is above 31 then it's append an index to the beginning .

I found it in the code, and I see how it is work, but I don't understand why is it necessary for unique naming. Where is the relation between table length and uniqueness?

In the code: Scan report generation part: https://github.com/OHDSI/WhiteRabbit/blob/172f8c30c9b252cdf063ef38289156e709aaa3d9/whiterabbit/src/main/java/org/ohdsi/whiteRabbit/scan/SourceDataScan.java#L174-L180 indexTableNameForSheet: https://github.com/OHDSI/WhiteRabbit/blob/172f8c30c9b252cdf063ef38289156e709aaa3d9/rabbit-core/src/main/java/org/ohdsi/rabbitInAHat/dataModel/Table.java#L134-L141

The comment in the code is the following: // Prepend index for long table names (to make sheet name unique later)

I have a few table which have unique names, but it looks a little bit weird with these indexes in the ETL specifications.

Can you do something with this (in my opinion not necessary) function? - Maybe just increase this length to 60 or make it parametrizable on the GUI.

Thanks in advance! Adam

MaximMoinat commented 3 years ago

The limitation is (as often in medical science) with Excel: tab names can't be longer than 31 characters. And in turn this causes issues in Rabbit in a Hat, which sees multiple tables with the same name and doesn't know which belongs to which tab.

I don't like this solution either, so if you see another way I am very open to it!

--edit: I can see one small improvement. If field values are not scanned, then no tabs will be created for each table. So then this uniqueness restriction does not have to be applied. Do you need to scan field values?

kesadae11 commented 3 years ago

Thanks for your answer! My brain getting exploded when I was reading that this is because of the excel sheet length limitation. I could have thought of it, but now I see why is it necessary. I will tun off the field value scanning. At this time I can work without these information. Thanks for Your tip!

I have a suggestion (maybe not better what the present one). What if you add one more column to the first sheet (Field Overview), as 'Related Sheet'? This column can contains the sheets name and the Table column the original name, and the RIAH tool could knows what was the original name. If you can do this, than you can name the sheets to anything you will always know the original names.

image

I hope it is a useful idea, but if not then we can close this thread, technically your suggestion solved my base problem.

MaximMoinat commented 3 years ago

Thanks for your idea, that is actually a nice solution. That related sheet column can be hidden as well. It does require changes in both the WR and RiaH tool.

I see three complementary fixes: