bio-learn / biolearn

Machine learning tools for biomarker analysis
Other
46 stars 17 forks source link

GSE110554 metadata loads incorrectly #87

Open moqri opened 3 months ago

moqri commented 3 months ago

Two of the samples in this dataset are loaded incorrectly:

image

It seems the issue is with their mis-formatting in GEO.

This is my code to correct for these:

meta1=pd.read_table(<path>,nrows=10**2,skiprows=38,index_col=0).iloc[13].str.strip('cell type: ').drop(['GSM2998097','GSM2998106'])
meta2=pd.read_table(<path>,nrows=10**2,skiprows=38,index_col=0)[['GSM2998097','GSM2998106']].iloc[14].str.strip('cell type: ')
meta=pd.concat([meta1,meta2])
dnam=pd.read_table(mat,nrows=10**6,skiprows=38+59,index_col=0)
dnam=dnam.drop('!series_matrix_table_end') (edited) 
sarudak commented 3 months ago

Interesting. Perhaps we need an option for some kind of post load corrections to be added.