OHDSI / WhiteRabbit

WhiteRabbit is a small application that can be used to analyse the structure and contents of a database as preparation for designing an ETL. It comes with RabbitInAHat, an application for interactive design of an ETL to the OMOP Common Data Model with the help of the the scan report generated by White Rabbit.
http://ohdsi.github.io/WhiteRabbit
Apache License 2.0
179 stars 89 forks source link

Aliases for tables, attributes and add a short description #168

Open marcel1334 opened 5 years ago

marcel1334 commented 5 years ago

Sometimes the names of tables and attributes are selfexplaining. But we also see very short names and/or in a local language. It would be handy if the user can add aliases for the table and attribute names so that these can be included in the scan report. Either after the scanning or add this info in a meta datafiles. Also a very short description on table/attribute level would help interpreting the scan report.

PRijnbeek commented 5 years ago

Yes, I would support this enhancement. We have encountered this many times when mapping non English databases. We solved this at that time by asking them to provide a data dictionary with their White Rabbit output. It would be nice if White Rabbit would generate a template they can fill in per table/field that RaH can read (JSON?). For example:

For each column:

etc..

MaximMoinat commented 5 years ago

Yes, we should be able to ingest the data dictionary. We provide the format for this, which can be a simple excel file that the data owner can extract from their own dictionary.

The alias enhancement I see as less important. Why would we want to provide another name for a column if we can also add a description?

MaximMoinat commented 5 years ago

After some discussion internally, we have come up with the following approach. Based on the following:

@marcel1334 Any thoughts? Including @spayralbe and @anne0507, as they will be contributing as well.

marcel1334 commented 5 years ago

Good to hear that aliases and descriptions are on the table!!!

As I understand correctly, the three ways you suggest are done in a separate file/database and must be imported in the scan report first and then that updated scan-report has to be loaded in HiaH before its shown in RiaH. Is that correct?

Technically this will work, but I was thinking about the workflow. I think the aliases and descriptions are not a static thing you only specify before starting RiaH and never touch again. Especially the descriptions will be modified during the ETL design. During the ETL design discussions with the source-db-experts will result in extra info about the source tables and columns you want to add to the table/column descriptions and maybe will even modify the given alias as it turns out a better alias is available. So, I think it would be very handy to edit these source aliases/descriptions from within RiaH (as well), since this is probably the tool on the screen at that moment.

In this situation, you can change the alias/description and directly continue with the ETL design and discussion with the source-db-expert. Instead of switching tools and reimport the modifications. At the background I can imaging you can store it in the same scan report as you suggested of course.

Does this make sense and this feasible?

MaximMoinat commented 5 years ago

RiaH already provides that functionality, you can add a 'comment' to the source tables and fields. The suggested approach is just a way to bulk-import the available source descriptions. In my view, these descriptions are static (provided one time by the source).

About the aliases; basically this would be to modifiable table/field names. e.g. when you right-click on a box, you get the option to rename it. I am a bit hesitant about this, as it can give problems with the output later on (especially the test framework).