Closed jjacobson95 closed 1 month ago
There should be 71. one is an nan
, the rest are matching errors. i'd call this a low priority bug at this point.
I agree. Since I've been looking at NCI-60, I observed this as well when looking at the October 2024 release. In the general comments here, they do mention other cell lines:
There are 60 cell lines in the current NCI60 cell line screen. There are 11 other cell lines that were part of the NCI60 screen in the past. These 71 cell lines comprise most of the public data. This data release also includes other cell lines which have been assayed at least once using the same protocols as the NCI60 cell line screen.
I refer to this table to get the 60 cell lines and the corresponding cellosaurus IDs.
I counted data for 163 cell lines in the October 2024 dataset, 82 of them have improve_sample_ids. I propose using those 82. It seems silly to throw out data when we have valid identifiers. People can filter later on if they want.
Based on unique improve_sample_ids, our NCI60 experiments data has 92 unique samples / cell lines. As far as I am aware, the NCI60 should have 60.
Reproduce:
Download Data on the Command line:
View Data in Python