PNNL-CompBio / coderdata

Automation scripts and benchmark dataset package for cancer drug prediction deep learning models.
Other
11 stars 3 forks source link

NCI60 has more than 60 cell lines in our current data #231

Open jjacobson95 opened 5 hours ago

jjacobson95 commented 5 hours ago

Based on unique improve_sample_ids, our NCI60 experiments data has 92 unique samples / cell lines. As far as I am aware, the NCI60 should have 60.

Reproduce:

Download Data on the Command line:

coderdata download --prefix broad_sanger
coderdata download --prefix genes

View Data in Python

import coderdata as cd
bs = cd.DatasetLoader("broad_sanger")
len(bs.experiments[bs.experiments.study =="NCI60"].improve_sample_id.unique())
print(bs.experiments[bs.experiments.study =="NCI60"].improve_sample_id.unique())
sgosline commented 3 hours ago

There should be 71. one is an nan, the rest are matching errors. i'd call this a low priority bug at this point.