ImagingDataCommons / ETL

(CORE REPO)
Apache License 2.0
0 stars 1 forks source link

[clinical] Clinical data "inventory" table #23

Closed fedorov closed 2 years ago

fedorov commented 2 years ago

Current organization of tables has 2 components:

However, in the general case, 1) we will have more than 1 clinical table per collection (with different schemas); 2) we will at least sometime have the need to communicate description of the specific table (table level metadata).

Examples are the NLST collection and ACRIN clinical tables.

I suggest we introduce another level for organization that has the following columns (we could call it clinical_data_inventory or something like that?):

This follows the approach implemented for ACRIN in https://github.com/fedorov/idc-clinical-cleanup, with the result in https://console.cloud.google.com/bigquery?p=idc-tcia&d=af_clinical_sandbox&page=dataset. There, those table-level metadata attributes are organized in tables per-collection (<collection_id>_dict), but we might as well put it all into a single table.

G-White-ISB commented 2 years ago

So this would be a structured record in the clinical_meta table with these 3 columns (collection_id, clinical_table_id, description).

fedorov commented 2 years ago

this would be a structured record in the clinical_meta table

@G-White-ISB you probably meant to say clinical_meta_column table, right?

Can we come up with the names that better reflect the content of those tables? Maybe "clinical_meta" can be "table_metadata", and "clinical_meta_column" can be "column_metadata"? I am not saying those are great names, but maybe a bit less confusing.

G-White-ISB commented 2 years ago

My comment above was made when there was just the clinical_meta table. clinical_meta_table and clinical_meta_column were invented later. But I'm fine with your recommended name changes

G-White-ISB commented 2 years ago

We now have table_metadata and column_metadata tables. Suggest we can close this issue