Datasets that have been processed from raw data should be double-checked by cross-referencing the values provided e.g. by cBioportal; for example compare RMA-normalized Taylor et al. samples with the same samples from MSKCC dataset in cBio, or inside GEO the matrices produced from RMA normalization against the pre-normalized data. The values should correlate, otherwise there's been some systematic error in annotating sample names or in methodology.
Going into version 0.7, all current datasets have been re-processed, and checked that they still maintained good correlation to genes in the original presented source (for example direct download from GEO).
Datasets that have been processed from raw data should be double-checked by cross-referencing the values provided e.g. by cBioportal; for example compare RMA-normalized Taylor et al. samples with the same samples from MSKCC dataset in cBio, or inside GEO the matrices produced from RMA normalization against the pre-normalized data. The values should correlate, otherwise there's been some systematic error in annotating sample names or in methodology.