CostaLab / scMEGA

scMEGA: Single-cell Multiomic Enhancer-based Gene regulAtory network inference
https://costalab.github.io/scMEGA
Other
37 stars 2 forks source link

SelectGenes use other genome #17

Open liyang24 opened 1 year ago

liyang24 commented 1 year ago

Hi! i noticed that SelectGenes Available genome are: hg19, hg38, mm9, and mm10 can i use the genomes of other species? If so, how exactly should I do it

Dalhte commented 1 year ago

Hello @liyang24 I am not a specialist but here is what I did to implement the rat genome (rn6)

create a gene annotation for RN6

get the library

library(org.Rn.eg.db) library(TxDb.Rnorvegicus.UCSC.rn6.refGene)

geneAnnotationRN6 <- createGeneAnnotation(TxDb = TxDb.Rnorvegicus.UCSC.rn6.refGene, OrgDb = org.Rn.eg.db)

I had a "chr" prefix problem (chromsome are named "chr1" in the annotation but only "1" in my data) to remove the "chr" and obtain a file with adequate formate, I did, quite laboriously :

separation of the gene, exon and tss info

geneAnno <- geneAnnotationRN6$gene exonAnno <- geneAnnotationRN6$exon TSSAnno <- geneAnnotationRN6$TSS

cleaning the format

geneAnno <- keepStandardChromosomes(geneAnno, pruning.mode="coarse") exonAnno <- keepStandardChromosomes(exonAnno, pruning.mode="coarse") TSSAnno <- keepStandardChromosomes(TSSAnno, pruning.mode="coarse")

geneAnno <- renameSeqlevels(geneAnno, c(chr1="1", chr2="2", chr3="3", chr4="4", chr5="5", chr6="6", chr7="7", chr8="8", chr9="9", chr10="10", chr11="11", chr12="12", chr13="13", chr14="14", chr15="15", chr16="16", chr17="17", chr18="18", chr19="19", chr20="20", chrX="X", chrY="Y")) exonAnno <- renameSeqlevels(exonAnno, c(chr1="1", chr2="2", chr3="3", chr4="4", chr5="5", chr6="6", chr7="7", chr8="8", chr9="9", chr10="10", chr11="11", chr12="12", chr13="13", chr14="14", chr15="15", chr16="16", chr17="17", chr18="18", chr19="19", chr20="20", chrX="X", chrY="Y")) TSSAnno <- renameSeqlevels(TSSAnno, c(chr1="1", chr2="2", chr3="3", chr4="4", chr5="5", chr6="6", chr7="7", chr8="8", chr9="9", chr10="10", chr11="11", chr12="12", chr13="13", chr14="14", chr15="15", chr16="16", chr17="17", chr18="18", chr19="19", chr20="20", chrX="X", chrY="Y"))

creation of a new annotation file

geneAnnotationRN6.2 <- createGeneAnnotation( genome = NULL, TxDb = NULL, OrgDb = NULL, genes = geneAnno, exons = exonAnno, TSS = TSSAnno, annoStyle = NULL )

then i copied the peaktogene to run it locally (because I don't known how to modify it directly) with some modifications: -I had "rn6" to the genome list and the annotation : if (genome == "rn6") { gene_anno <- geneAnnotationRN6.2

then I run SelectGenes also locally, with some modifications : I replaced RNA assy with SoupXRNA because I had to run SoupX to clean the data I added genome = "rn6" in the selectgenes function and in the calling of the peaktogene function

And I have to admit that, to my surprise, it did worked :)

PS I had to implement the motif names call solution proposed by @AmandaKedaigle in the issue " Motif matching when using mouse data https://github.com/CostaLab/scMEGA/issues/18 " every time needed (in the GetTFGeneCorrelation function, in the colnames(motif.matching)) if not the different format of the motif names blocked the analysis.

Moreover, I tried to used the jaspar2022 instead of the Jasper2020 and there is a problem with mofit name redundancy

I hope it will help and also that I did not do something stupid, but the results seems consistent with my prior knownledge of the biological system so....

Best David

wangb1119 commented 6 months ago

Hello, according to your method, I also created a gene annotation for macaques, and SelectGenes by using the self-defined gene annotation file. However, after that, I don't know how to calculate and define the rest of this function, especially how to use the gene annotation information. Only the selection logic of genome annotation is written, but the operation of selecting genes is not carried out.

            sel.tfs <- SelectTFs(object = pbmc.t.cells,
             return.heatmap = TRUE,
            cor.cutoff = 0.4)

df.cor <- sel.tfs$tfs ht <- sel.tfs$heatmap pdf("./heatmap_SelectTFs.pdf") t = draw(ht) dev.off()

annotation <- genes(EDB, return.type = "GRanges") exon_annotation <- exons(EDB, return.type = "GRanges") TSS_annotation <- promoters(EDB, upstream=0, downstream=1, return.type = "GRanges")

annotation <- genes(EDB, return.type = "GRanges") exon_annotation <- exons(EDB, return.type = "GRanges") TSS_annotation <- promoters(EDB, upstream=0, downstream=1, return.type = "GRanges")

geneAnno <- annotation exonAnno <- exon_annotation TSSAnno <- TSS_annotation

ucsc.levels <- str_replace(string = paste("chr", seqlevels(annotation), sep = ""), pattern = "chrMT", replacement = "chrM") seqlevels(annotation) <- ucsc.levels seqlevels(geneAnno) <- ucsc.levels seqlevels(exonAnno) <- ucsc.levels seqlevels(TSSAnno) <- ucsc.levels

geneAnnotationMmu <- createGeneAnnotation( genome = NULL, TxDb = NULL, OrgDb = NULL, genes = geneAnno, exons = exonAnno, TSS = TSSAnno, annoStyle = NULL ) SelectGenes <- function(object, atac.assay = "ATAC", rna.assay = "RNA", var.cutoff.gene = 0.9, trajectory.name = "Trajectory", distance.cutoff = 2000, groupEvery = 1, cor.cutoff = 0, fdr.cutoff = 1e-04, return.heatmap = TRUE, labelTop1 = 10, labelTop2 = 10, genome = "mihou") { if (is.character(genome)) { if (genome == "mihou") { gene_anno <- geneAnnotationMmu } else { stop("Unsupported genome annotation") } } else if (is(genome, "GRanges")) { gene_anno <- genome } else { stop("Unsupported genome annotation format") }

Based on the selection logic of genome annotation, I want to calculate something for gene selection. I don't know if it is correct. I want to ask your suggestion. How should I do it? tfs <- data.frame(

gene = c(),
cor = c(0.5, 0.6, 0.7)

) heatmap_data <- matrix(rnorm(100), nrow = 10, ncol = 10) rownames(heatmap_data) <- paste0("Gene", 1:10) colnames(heatmap_data) <- paste0("Cell", 1:10)

heatmap <- Heatmap(heatmap_data, name = "Gene Expression", show_row_names = TRUE, show_column_names = TRUE) return(list(p2g = tfs, heatmap = heatmap)) }

Next, I selected the genes, but I didn't get p2g and heatmap. sel.genes <- SelectGenes(

object = pbmc.t.cells, atac.assay = "ATAC", rna.assay = "RNA", var.cutoff.gene = 0.9, trajectory.name = "Trajectory", distance.cutoff = 2000, groupEvery = 1, cor.cutoff = 0, fdr.cutoff = 1e-04, return.heatmap = TRUE, labelTop1 = 10, labelTop2 = 10, genome = "mihou"
)

df.p2g <- sel.genes$p2g ht <- sel.genes$heatmap In addition, I also want to ask, when I am executing GetTFGeneCorrelation:
tf.gene.cor <- GetTFGeneCorrelation(object = pbmc.t.cells, tf.use = df.cor$tfs, gene.use = unique(df.p2g$gene), tf.assay = "chromvar", gene.assay = "RNA", trajectory.name = "Trajectory") there will be an error:
Find 553 shared features! Creating Trajectory Group Matrix.. Some values are below 0, this could be the Motif activity matrix in which scaleTo should be set = NULL. Continuing without depth normalization! Smoothing... Creating Trajectory Group Matrix.. Smoothing... Error in tf_activity[tf.use, ] : subscript out of bounds Calls: GetTFGeneCorrelation My question may be stupid, and I look forward to your answer.Thank you!