Closed fedorov closed 2 years ago
And after spending quite a bit trying to figure this out, I also realized that this null label makes the pandas regex to fail, which will most definitely be super-confusing to the users.
selection_df[selection_df["variable_label"].str.contains("therapy")]
One needs to realize that to work around the null you have to pass na = False
argument to the function above... No reason to keep that as null.
I suggest the labels idc_provenance_source_batch, idc_provenance_dicom_patient_id
variable_label is no column_label, and column_label has no blanks. Using the labels 'idc_provenance_source_batch', 'idc_provenance_dicom_patient_id' for the source_batch, dicom_patient_id columns
Currently, it appears to be blank for IDC-assigned columns. I suggest
variable_label
should indicate that those columns are not part of the source data. Maybe we can use a prefix like "[IDC provenance attribute]" or something like that.