Closed fedorov closed 2 years ago
This collection has clinical data spread across different sheets in the Excel file corresponding to different types of attributes (demographics, patient response, weight etc). This was not recorded correctly in the clinical_notes.json file. The code was trying to create the same table hnscc_3dct_rt_clinical, for each of these sheets, creating a race condition. The sheet with the weight information 'won'. The column_metadata table recorded the columns for all of the different versions of hnscc_3dct_rt_clinical being created.
Updating the clinical_notes.json file data should fix this problem. As noted in #35, new regression testing will hopefully pick these errors in the future.
Visual inspection of the tables confirms consistency between hnscc-3dct_rt table column names and metadata in column_metadata
column_metadata
lists a long list of variables for thehnscc_3dct_rt_clinical
collection:However, the referenced table is very short:
Developing this thought, an easy regression test should be, for each
table_name
incolumn_metadata
, take the list ofvariable_name
, and confirm that the list of columns from the schema in the corresponding table is exactly that. We should have a regression check and run this test on every update of the clinical metadata tables. I will submit a follow up separate ticket on that.