ImagingDataCommons / ETL

(CORE REPO)
Apache License 2.0
0 stars 1 forks source link

variable_label should not be blank #33

Closed fedorov closed 2 years ago

fedorov commented 2 years ago

Currently, it appears to be blank for IDC-assigned columns. I suggest variable_label should indicate that those columns are not part of the source data. Maybe we can use a prefix like "[IDC provenance attribute]" or something like that.

image

fedorov commented 2 years ago

And after spending quite a bit trying to figure this out, I also realized that this null label makes the pandas regex to fail, which will most definitely be super-confusing to the users.

selection_df[selection_df["variable_label"].str.contains("therapy")]

One needs to realize that to work around the null you have to pass na = False argument to the function above... No reason to keep that as null.

G-White-ISB commented 2 years ago

I suggest the labels idc_provenance_source_batch, idc_provenance_dicom_patient_id

G-White-ISB commented 2 years ago

variable_label is no column_label, and column_label has no blanks. Using the labels 'idc_provenance_source_batch', 'idc_provenance_dicom_patient_id' for the source_batch, dicom_patient_id columns