Neurosurgery-Brain-Tumor-Center-DiazLab / CONICS

CONICS: COpy-Number analysis In single-Cell RNA-Sequencing
73 stars 28 forks source link

getGenePositions parameter got the wrong location information #36

Open sunshine1126 opened 3 years ago

sunshine1126 commented 3 years ago

Hello,the location information may be wrong when I used the getGenePositions parameter to obtain the chromosomal positions of genes in the expression matrix, see as follow。

library(beanplot) library(mixtools) library(pheatmap) library(zoo) library(squash) library(biomaRt) library(CONICSmat) tt = getGenePositions(gene_names=c("ABCF1","ABHD16A","AGER","AGPAT1","AIF1","APOM")) tt

image image

soerenmueller commented 3 years ago

Hey,

thanks for reaching out. These are genes for which alternative positions have been reported (see attached). For those (~3% of ENSEBL genes), CONICSmat will set the chromosome to 0 in order to avoid using potentially incorrect genomic loci for CNV inference.

Screen Shot 2021-08-24 at 9 54 17 AM

sunshine1126 commented 3 years ago

@soerenmueller Thanks for your reply. In humans, each cell normally contains 23 pairs of chromosomes, for a total of 46. Twenty-two of these pairs, called autosomes, look the same in both males and females. The 23rd pair, the sex chromosomes, differ between males and females. So, I want to ask about the meaning of the chromosome "23" and "24". image

sciencepeak commented 1 year ago

@sunshine1126 Hi, You can read the source code and understand the meaning of chr 0, chr 23, chr 24 by reading the source code here GetPositions.R

gene_positions[which(gene_positions[,3]=="X"),3]=23
gene_positions[which(gene_positions[,3]=="Y"),3]=24
gene_positions[which(gene_positions[,3]=="MT"),3]=0
gene_positions[which(nchar(gene_positions[,3])>2),3]=0

The third column of the gene_positions_dataframe is the chromosome name. The code suggests that the chromosome X is replaced with number 23; the chromosome Y is replaced with number 24; the MT is replaced with number 0; the chromosome with weird long name is also replaced with number 0, as mentioned above by soerenmueller

These are genes for which alternative positions have been reported (see attached). For those (~3% of ENSEBL genes); CONICSmat will set the chromosome to 0 in order to avoid using potentially incorrect genomic loci for CNV inference.