GoekeLab / proActiv

Estimation of Promoter Activity from RNA-Seq data
https://goekelab.github.io/proActiv/
Other
45 stars 14 forks source link

estimating promoter activity from given bam files #33

Closed ninjaxfy closed 2 years ago

ninjaxfy commented 2 years ago

Hi, I've ran the whole workflow with the given junction files. And I was quiet confused that when I use the given bam file and the given code to run the flow, why I just got the null result(eg. 0 , na and null).

jleechung commented 2 years ago

Hi @ninjaxfy thanks for your interest in proActiv! Could you provide a minimal reproducible example / the code you used, along with session information, so that we can see what the problem is?

ninjaxfy commented 2 years ago

@jleechung thx a lot for reply!!! I am a student and just began to learn bioinformatics systematically, hence I might made mistakes, and I hope it won't take you too much time. I have another question in the function preparePromoterAnnotation, there is a argument "species", does it means that ProActiv can only run those 12 species? and whether it can run other animals' rna-seq data?(eg. chicken, chimp ...) And the end of the letter is my code in R(mostly copy in the homepage of proactiv ).

test workflow with bamfile in package

A complete workflow to identify alternative promoter usage

library(proActiv)

From BAM files - genome parameter must be provided

files <- list.files(system.file('extdata/testdata/bam', package = 'proActiv'), full.names = TRUE)

Preparing promoter annotations from a gtf file or txdb file

From GTF file path

gtf.file <- system.file('extdata/vignette/annotations/gencode.v34.annotation.subset.gtf.gz', package = 'proActiv') promoterAnnotation.gencode.v34.subset <- preparePromoterAnnotation(file = gtf.file, species = 'Homo_sapiens')

From TxDb object

txdb.file <- system.file('extdata/vignette/annotations/gencode.v34.annotation.subset.sqlite', package = 'proActiv') txdb <- loadDb(txdb.file) promoterAnnotation.gencode.v34.subset <- preparePromoterAnnotation(txdb = txdb, species = 'Homo_sapiens')

result <- proActiv(files = files, promoterAnnotation = promoterAnnotation.gencode.v34.subset, genome = 'hg38')

filter result

Removes single-exon transcripts / promoters by eliminating promoter counts that are NA

result <- result[complete.cases(assays(result)$promoterCounts), bamtestcode.txt ]

jleechung commented 2 years ago

Hi @ninjaxfy, no worries.

For preparePromoterAnnotation for other species, please see issue #25.

For the bam result: the bam files provided with the package are really for testing, and so we subset the bam files to very small regions (for file size reasons). Most of the promoters are thus not quantified. You can extract those that are expressed as follows, and find that there are seven promoters found active in at least one of the samples.

> files <- list.files(system.file('extdata/testdata/bam', 
+                                 package = 'proActiv'), 
+                     full.names = TRUE)
> 
> annotation <- promoterAnnotation.gencode.v34.subset
> 
> result <- proActiv(files = files, 
+                    promoterAnnotation = annotation, 
+                    genome = 'hg38')
> result <- result[complete.cases(assays(result)$promoterCounts), ]
> promoterCounts <- assays(result)$promoterCounts
> promoterCounts <- promoterCounts[rowSums(promoterCounts) > 0, ]
> promoterCounts
   SGNEx_A549_Illumina_replicate1.run1_genome_subset
136                                               157
137                                                50
138                                                20
139                                                 3
141                                                 2
177                                               109
709                                                 0
    SGNEx_MCF7_Illumina_replicate2.run1_genome_subset
136                                                35
137                                                 7
138                                                 2
139                                                 0
141                                                 1
177                                               231
709                                                 2

session info:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252   
[3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Singapore.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0                  
 [3] rtracklayer_1.49.5                Biostrings_2.58.0                
 [5] XVector_0.30.0                    GenomicRanges_1.42.0             
 [7] GenomeInfoDb_1.26.7               IRanges_2.24.1                   
 [9] S4Vectors_0.28.1                  BiocGenerics_0.36.1              
[11] proActiv_1.3.4                    testthat_3.0.4                   

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                matrixStats_0.60.0         
 [3] fs_1.5.0                    usethis_2.0.1              
 [5] devtools_2.4.2              bit64_4.0.5                
 [7] RColorBrewer_1.1-2          progress_1.2.2             
 [9] httr_1.4.2                  rprojroot_2.0.2            
[11] tools_4.0.3                 utf8_1.2.2                 
[13] R6_2.5.1                    KernSmooth_2.23-17         
[15] DBI_1.1.1                   colorspace_2.0-2           
[17] withr_2.4.2                 tidyselect_1.1.1           
[19] prettyunits_1.1.1           processx_3.5.2             
[21] DESeq2_1.30.1               curl_4.3.2                 
[23] bit_4.0.4                   compiler_4.0.3             
[25] cli_3.0.1                   Biobase_2.50.0             
[27] xml2_1.3.2                  desc_1.3.0                 
[29] DelayedArray_0.16.3         caTools_1.18.2             
[31] scales_1.1.1                genefilter_1.72.1          
[33] callr_3.7.0                 askpass_1.1                
[35] rappdirs_0.3.3              Rsamtools_2.6.0            
[37] stringr_1.4.0               pkgconfig_2.0.3            
[39] sessioninfo_1.1.1           MatrixGenerics_1.2.1       
[41] dbplyr_2.1.1                fastmap_1.1.0              
[43] rlang_0.4.11                rstudioapi_0.13            
[45] RSQLite_2.2.7               generics_0.1.0             
[47] gtools_3.9.2                BiocParallel_1.24.1        
[49] dplyr_1.0.7                 RCurl_1.98-1.3             
[51] magrittr_2.0.1              GenomeInfoDbData_1.2.4     
[53] Matrix_1.3-4                Rcpp_1.0.7                 
[55] munsell_0.5.0               fansi_0.5.0                
[57] lifecycle_1.0.0             stringi_1.7.3              
[59] SummarizedExperiment_1.20.0 zlibbioc_1.36.0            
[61] gplots_3.1.1                pkgbuild_1.2.0             
[63] BiocFileCache_1.14.0        grid_4.0.3                 
[65] blob_1.2.2                  crayon_1.4.1               
[67] lattice_0.20-41             splines_4.0.3              
[69] GenomicFeatures_1.42.3      annotate_1.68.0            
[71] hms_1.1.0                   locfit_1.5-9.4             
[73] ps_1.6.0                    pillar_1.6.2               
[75] geneplotter_1.68.0          biomaRt_2.46.3             
[77] pkgload_1.2.1               XML_3.99-0.6               
[79] glue_1.4.2                  data.table_1.14.0          
[81] remotes_2.4.0               vctrs_0.3.8                
[83] gtable_0.3.0                openssl_1.4.4              
[85] purrr_0.3.4                 assertthat_0.2.1           
[87] cachem_1.0.5                ggplot2_3.3.5              
[89] xtable_1.8-4                survival_3.2-7             
[91] tibble_3.1.3                GenomicAlignments_1.26.0   
[93] AnnotationDbi_1.52.0        memoise_2.0.0              
[95] ellipsis_0.3.2             
> 
ninjaxfy commented 2 years ago

@jleechung sorry to reply late. your advice indeed do me a great favor! thx again! Best wishs!