Open nlee-sage opened 6 months ago
Code used as a quick fix for the issue
# import CSV file. In this case it is an assay metadata file
## Removes any empty columns, should be the specify columns only since they are optional
assay_metadata_df = assay_metadata_df.dropna(axis = 1, how = 'all')
# check specify columns for different values
import re
r = re.compile('specify*', flags = re.IGNORECASE)
specify_cols = [c for c in list(assay_metadata_df.columns) if bool(r.search(str(c)))]
# removing specify from the column names to search for normal terms
normal_cols = [re.sub('specify', "", s) for s in specify_cols]
# create link between columns
pairings = []
for c in normal_cols:
pairings.append([s for s in list(assay_metadata_df.columns) if bool(re.search(c, s, flags = re.IGNORECASE))])
print(pairings)
# test changes first before reassigning to main dataframe
assay_metadata_df_temp = assay_metadata_df.copy(deep = True)
for p in pairings:
print(p[0], p[1], sep = ' : ')
assay_metadata_df_temp[p[0]] = assay_metadata_df_temp[p[1]]
assay_metadata_df_temp = assay_metadata_df_temp.drop(columns = [p[1]])
print(assay_metadata_df_temp.head)
print(assay_metadata_df_temp.shape)
During curation, I found a metadata file that uses "specifyPlatformVersion". The values in "specifyPlatformVersion" need to replace the values in "platformVersion" because "OtherPlatformVersion" is not a helpful annotation and creates two columns contributors would have to search by.
This issue applies to all "specify" columns across templates.