Cancer types are not well defined, and this is adding too much bias in the data analysis, at the point of invalidating the interpretation of the results.
The initial plot and run of the pipeline is not too bad to be discarded, but is only useful to introduce the subject. Additional results should be generated and saved to dissect deeper the different types of cancer.
The 'badness' of the data is for various reasons:
The metasplit calls that govern what TCGA data to sort in the various bins collate together too many different types of cancer. They should be broken down - in a more detailed way.
Some cancer types are intrinsically very eterogeneous (e.g. brain cancer), and are difficult to interpret in a "broad" sense. Less variable cancer types (see below) are more suitable for inspection.
The TCGA / GTEX datasets, while invaluable, are problematic due to the different origin of the samples and dubious sample quality and classification.
Clinical variables, like tumor staging and age/sex might add more variability to the results. A big driver of transporter down-regulation might be dedifferentiation, and tumor grade classification might be a way to check for this.
For this reason, one should:
[ ] Redefine the metasplit calls to be more specific, e.g. on only one type of cancer (#14);
Critical areas are the female ro cancer and Head n Neck cancer, among others.
[ ] Stratify the original analysis with different clinical variables, like tumor type (#15);
[ ] Find additional data to both check the validity of the initial data with and to better focus on different types of cancer (#17).
[ ] Metastatic (e.g. taken from a metastasis) and primary tumors are generally very different. Metastatic samples should be treated differently rather than together with primary samples (#16)
Cancer types that are broadly similar, with well-defined "healthy" counterparts and therefore suitable for dissection are:
Cancer types are not well defined, and this is adding too much bias in the data analysis, at the point of invalidating the interpretation of the results.
The initial plot and run of the pipeline is not too bad to be discarded, but is only useful to introduce the subject. Additional results should be generated and saved to dissect deeper the different types of cancer.
The 'badness' of the data is for various reasons:
metasplit
calls that govern what TCGA data to sort in the various bins collate together too many different types of cancer. They should be broken down - in a more detailed way.For this reason, one should:
metasplit
calls to be more specific, e.g. on only one type of cancer (#14);female ro cancer
andHead n Neck cancer
, among others.Cancer types that are broadly similar, with well-defined "healthy" counterparts and therefore suitable for dissection are: