Closed rvimieiro closed 2 years ago
The models are identical but the way new dataset are rescaled for predictions is different. Ideally the centroids should have been re-estimated from the dataset of Parker et al 2009, but this was not the case.
Thanks for the reply. I got what you said. At first, I was left with the impression that you did re-estimate the centroids from rescaled data using either of the methods. But actually the different models are just shortcuts for rescaling the data before predicting labels.
Hi,
there might be a mistake related to pam50 and pam50.robust models. The help page states:
pam50 Use of the official centroids without scaling of the gene expressions.
pam50.scale Use of the official centroids with traditional scaling of the gene expressions (see scale).
pam50.robust Use of the official centroids with robust scaling of the gene expressions (see rescale).
However, the models differ from each other only regarding the attribute standardization (std). The following code
results in
The question is: these centroids match the ones found at https://genome.unc.edu/pubsup/breastGEO/pam50_centroids.txt, but are these the scaled or not scaled version?
Having both exactly the same (i.e. assuming they are supposed to be the same), except for the standardization parameter, is misleading because people might attempt to use the non-standardized/scaled model with their standardized/scaled data (or vice-versa) and get completely wrong results.
I hope it will help!
Regards
Renato
Update:
The same happens with pam50.scale indicating all of them are identical except for the std variable.