cBioPortal / cbioportal

cBioPortal for Cancer Genomics
https://cbioportal.org
GNU Affero General Public License v3.0
578 stars 440 forks source link

Support of negative value entrez_gene_id genes in clickhouse table development #10873

Open sheridancbio opened 1 week ago

sheridancbio commented 1 week ago

Some production databases at MSK, and also if users set up their database with microrna support, or if any phosphorylated genes are added as gene records, then negative values will be assigned to these gene table entries via the function DaoGene.getNextFakeEntrezId() in cbioportal-core repo

see: https://github.com/cBioPortal/cbioportal-core/blob/efcc1d2179e26e289e78a138e6d047c6906e36f4/src/main/java/org/mskcc/cbio/portal/dao/DaoGene.java#L71 https://docs.cbioportal.org/deployment/deploy-without-docker/import-the-seed-database/#download-the-cbioportal-seed-database https://github.com/cBioPortal/cbioportal-core/blob/main/src/main/resources/micrornas.tsv https://github.com/cBioPortal/cbioportal-core/blob/main/src/main/java/org/mskcc/cbio/portal/scripts/ImportMicroRNAIDs.java

The current efforts at developing clickhouse functionality has not yet encountered negatively valued entrez_gene_id records, but this possibility should be covered and tested before the completion and deployment of clickhouse enabled portals.

This effects the clickhouse table construction scripts and possibly also downstream logic in the persistence layer of cBioPortal.

This arises from this line: https://github.com/cBioPortal/cbioportal/blob/79d36e73f1aeff6d0ab4697e77aa210752772ad6/src/main/resources/db-scripts/clickhouse/clickhouse.sql#L49

sheridancbio commented 1 week ago

This issue was created after review of https://github.com/cBioPortal/cbioportal/pull/10867 (@haynescd)