Open forus opened 9 months ago
There are several ideas to make it possible:
cBioPortal study loader should be aware of which gene is non-coding. One way to achieve that is by querying Genome Nexus. The gene list should be more complete and include deprecated, uncharacterized ncRNA and pseudogenes, just to know which we can safely skip.
I loaded chol_tcga_pan_can_atlas_2018 study to cBioPortal (hg19_hg38_v2.13.0 seed data).
I got a lot of warnings like the following:
I've made a further investigation of these genes. 2130 Entrez IDs in the
chol_tcga_pan_can_atlas_2018
data could not be found in the recent cBioPortal database.See this table for detailed list of Entrez Ids and their classification: chol_tcga_pan_can_atlas_2018_missing_entrez_ids_classified.txt
I doubt this information should be represented as warnings and in such a verbose view (line per Entrez ID per file) for non-coding genes. The risk of doing so is to devaluate the concept of warnings; people start ignoring them altogether.
Filtering out this data seems the straightforward thing to do.
As a user, I would like to get short information (not classified as a warning/error) on how many records were skipped because they were associated with non-coding genes.