GoekeLab / proActiv

Estimation of Promoter Activity from RNA-Seq data
https://goekelab.github.io/proActiv/
Other
45 stars 14 forks source link

map to txId by promoterId #27

Closed yuukiiwa closed 3 years ago

yuukiiwa commented 3 years ago

Hi @jleechung, I have made the change mapping to txId based on promoterId instead of geneId so that the txId column corresponds to the specific transcripts associated with a promoter. You can test out my fork and the GoekeLab fork with the following code:

## please delete previous installation of proActiv before running the following
## get to the proActiv directory
devtools::load_all()

## List of STAR junction files as input
files <- list.files(system.file('extdata/vignette/junctions', 
                                package = 'proActiv'), full.names = TRUE)

## Vector describing experimental condition
condition <- rep(c('A549','HepG2'), each=3)

## Promoter annotation for human genome GENCODE v34 restricted to a subset of chr1
promoterAnnotation <- promoterAnnotation.gencode.v34.subset
result <- proActiv(files = files, 
                   promoterAnnotation = promoterAnnotation,
                   condition = condition)
rdata <- rowData(result)

## this will use the following promoter id as example
### each promoter is associated with one transcript only
promoterIdMapping <- promoterIdMapping(promoterAnnotation)
promoterIdMapping[promoterIdMapping$geneId == "ENSG00000116663.11",]

## the results table cannot be loaded onto R b/c tx_id is in c("EN", ...) format
results_from_vignette <- data.frame(rdata)
check_tab_vignette <- results_from_vignette[results_from_vignette$geneId == "ENSG00000116663.11",]
check_tab_vignette
check_tab_vignette$txId
## execute the following to write results with the counts to files
# results_from_vignette <- data.frame(rdata, assays(result)$promoterCounts)
# write.table(results_from_vignette, file = "proActiv_results_from_vignette.csv",
#             sep = "\t", quote = FALSE, row.names = FALSE)

## collapsing the entries in the c("EN", ...) into a string separated by ";"
rdata$txId <- sapply(rdata$txId,paste,collapse=";")
results_from_pipeline <- data.frame(rdata)
check_tab_pipeline <- results_from_pipeline[results_from_pipeline$geneId == "ENSG00000116663.11",]
check_tab_pipeline
data.frame(check_tab_pipeline$promoterId,check_tab_pipeline$txId)
## execute the following to write results to files
# results_from_pipeline <- data.frame(rdata, assays(result)$promoterCounts)
# write.table(results_from_pipeline, file = "proActiv_results_from_pipeline.csv",
#             sep = "\t", quote = FALSE, row.names = FALSE)

I ran the proActiv from the two forks with samples ("A549" and "HepG2") from the vignette, and here are the screenshots: GoekeLab/proActiv: Screenshot 2021-06-23 at 6 03 51 PM

yuukiiwa/proActiv: Screenshot 2021-06-23 at 6 06 00 PM

Thanks!

jleechung commented 3 years ago

this is great. thanks!