Aufiero / circRNAprofiler

10 stars 3 forks source link

I used "chrom", "startUpBSE", and "endDownBSE" to match both two methods, but only 3 circRNAs are found in those two methods. #9

Closed JianGuoZhou3 closed 3 years ago

JianGuoZhou3 commented 3 years ago
> dim(EGA_circexplorer2)
[1] 101981    355
> dim(EGA_nclscan)
[1] 36601   355

In my dataset, circexplorer2 tool produced 101981 cicrRNAs, and nclscan produced 36601 cicrRNAs.

tmp <- EGA_circexplorer2[,c(1:6)]
tmp2 <- EGA_nclscan[,c(1:6)]
colnames(tmp2)
# [1] "chrom"      "startUpBSE" "endDownBSE"
tmp1 <- merge (tmp, tmp2, by=c("chrom", "startUpBSE", "endDownBSE"))
dim(tmp1)
[1] 3 9

I used "chrom", "startUpBSE", and "endDownBSE" to match both two methods, but only 3 circRNAs are found in those two methods.

tmp3 <- merge (tmp, tmp2, by=c("chrom", "startUpBSE"))
tmp4 <- merge (tmp, tmp2, by=c("chrom", "endDownBSE"))
tmp2.1<- tmp2
tmp2.1$endDownBSE <- tmp2$endDownBSE-1
tmp1.1 <- merge (tmp, tmp2.1, by=c("chrom", "startUpBSE", "endDownBSE"))
 dim(tmp1.1)
[1] 13338     9

Then, used "chrom" and "startUpBSE" to match those, furthermore, keep the difference of endDownBSE in two methods only is 1. We found 13338 circRNAs.

tmp2.2<- tmp2
tmp2.2$startUpBSE <- tmp2$startUpBSE-1
tmp1.2 <- merge (tmp, tmp2.2, by=c("chrom", "startUpBSE", "endDownBSE"))
dim(tmp1.2)
[1] 15444     9
tmp1.3 <-rbind(tmp1.1,tmp1.2)
 dim(tmp1.3)
[1] 28782     9

Similarly, used "chrom" and "endUpBSE" to match those, furthermore, keep the difference of startUpBSE in two methods only is 1. We found 15444 circRNAs. Could you please check those are fine or not? Best, Jian-Guo

Aufiero commented 3 years ago

Hi Jian-Guo,

You are now trying to merge using the merge function but if the circRNA coordinates reported by the different circRNA detection tools differ by just 1 nucleotide, they won't be merged.

If you want to merge the circRNA found by the NCLscan and circExplorer2, you have to follow the workflow implemented by circRNaprofiler (see vignettes of the package), in particular with Module 3 - Merge commonly identified circRNAs you can merge circRNAs with the function mergeBSJunctions , this function fixes the slightly different coordinates reported by the different detection tools before grouping.

Best, S

JianGuoZhou3 commented 3 years ago

Hi @Aufiero, I used Module 3,

mergedBSJunctions <- mergeBSJunctions(backSplicedJunctions, gtf)

but, did get any co-confirmed circRNA. like my previous comments, "Similarly, used "chrom" and "endUpBSE" to match those, furthermore, keep the difference of startUpBSE in two methods only is 1. We found 15444 circRNAs." image

Aufiero commented 3 years ago

Hi Jian-Guo,

you can try to fix the coordinates with the param fixBSJsWithGTF. This param is set to FALSE by default (see the help of the function by typing ?mergeBSJunctions in the console and read the vignettes for more info)

So try: mergedBSJunctions <- mergeBSJunctions(backSplicedJunctions, gtf, fixBSJsWithGTF =TRUE)

Let me know if it works. Best, S

JianGuoZhou3 commented 3 years ago

Hi @Aufiero, Thanks for your quick reply. I used your code.

mergedBSJunctions <- mergeBSJunctions(backSplicedJunctions, gtf, fixBSJsWithGTF =TRUE)

However, there is nothing... image

gtf <- formatGTF("gencode.v34.annotation.gtf")

I upload part of my datasets, please check those. backSplicedJunctions.Rdata.zip gtf.Rdata.zip Best, Jian-Guo

Aufiero commented 3 years ago

I'll check those and I'll let you know.

Aufiero commented 3 years ago

Hi Jian-Guo,

I did a test and it works for me. I run this: mergedBSJunctions <- mergeBSJunctions(backSplicedJunctions, gtf, fixBSJsWithGTF =TRUE) Here the results that I got:

Rplot ``

Which version of circRNAprofiler do you have? can you show me your sessionInfo()

JianGuoZhou3 commented 3 years ago

Hi @Aufiero I still didn't work.

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0                  
 [3] rtracklayer_1.50.0                Biostrings_2.58.0                
 [5] XVector_0.30.0                    GenomicRanges_1.42.0             
 [7] GenomeInfoDb_1.26.2               IRanges_2.24.1                   
 [9] S4Vectors_0.28.1                  BiocGenerics_0.36.0              
[11] ggpubr_0.4.0                      ggplot2_3.3.3                    
[13] circRNAprofiler_1.4.2            

loaded via a namespace (and not attached):
  [1] readxl_1.3.1                      backports_1.2.1                  
  [3] AnnotationHub_2.22.0              BiocFileCache_1.14.0             
  [5] plyr_1.8.6                        splines_4.0.3                    
  [7] BiocParallel_1.24.1               gwascat_2.22.0                   
  [9] digest_0.6.27                     htmltools_0.5.1.1                
 [11] fansi_0.4.2                       magrittr_2.0.1                   
 [13] memoise_2.0.0                     openxlsx_4.2.3                   
 [15] limma_3.46.0                      readr_1.4.0                      
 [17] annotate_1.68.0                   matrixStats_0.58.0               
 [19] R.utils_2.10.1                    askpass_1.1                      
 [21] prettyunits_1.1.1                 colorspace_2.0-0                 
 [23] blob_1.2.1                        rappdirs_0.3.3                   
 [25] haven_2.3.1                       rbibutils_2.0                    
 [27] xfun_0.21                         dplyr_1.0.5                      
 [29] crayon_1.4.1                      RCurl_1.98-1.2                   
 [31] genefilter_1.72.1                 survival_3.2-7                   
 [33] VariantAnnotation_1.36.0          glue_1.4.2                       
 [35] universalmotif_1.8.3              gtable_0.3.0                     
 [37] zlibbioc_1.36.0                   seqinr_4.2-5                     
 [39] DelayedArray_0.16.2               car_3.0-10                       
 [41] abind_1.4-5                       scales_1.1.1                     
 [43] futile.options_1.0.1              DBI_1.1.1                        
 [45] edgeR_3.32.1                      rstatix_0.7.0                    
 [47] Rcpp_1.0.6                        xtable_1.8-4                     
 [49] progress_1.2.2                    foreign_0.8-81                   
 [51] bit_4.0.4                         httr_1.4.2                       
 [53] RColorBrewer_1.1-2                ellipsis_0.3.1                   
 [55] farver_2.1.0                      pkgconfig_2.0.3                  
 [57] XML_3.99-0.5                      R.methodsS3_1.8.1                
 [59] ggseqlogo_0.1                     dbplyr_2.1.0                     
 [61] locfit_1.5-9.4                    utf8_1.1.4                       
 [63] labeling_0.4.2                    tidyselect_1.1.0                 
 [65] rlang_0.4.10                      reshape2_1.4.4                   
 [67] later_1.1.0.1                     AnnotationDbi_1.52.0             
 [69] cellranger_1.1.0                  munsell_0.5.0                    
 [71] BiocVersion_3.12.0                tools_4.0.3                      
 [73] cachem_1.0.4                      cli_2.3.1                        
 [75] generics_0.1.0                    RSQLite_2.2.3                    
 [77] ade4_1.7-16                       broom_0.7.5                      
 [79] stringr_1.4.0                     fastmap_1.1.0                    
 [81] yaml_2.2.1                        knitr_1.31                       
 [83] bit64_4.0.5                       zip_2.1.1                        
 [85] purrr_0.3.4                       mime_0.10                        
 [87] formatR_1.7                       R.oo_1.24.0                      
 [89] xml2_1.3.2                        biomaRt_2.46.3                   
 [91] compiler_4.0.3                    rstudioapi_0.13                  
 [93] curl_4.3                          interactiveDisplayBase_1.28.0    
 [95] ggsignif_0.6.1                    tibble_3.1.0                     
 [97] geneplotter_1.68.0                stringi_1.5.3                    
 [99] futile.logger_1.4.3               GenomicFeatures_1.42.1           
[101] forcats_0.5.1                     lattice_0.20-41                  
[103] Matrix_1.3-2                      vctrs_0.3.6                      
[105] pillar_1.5.1                      lifecycle_1.0.0                  
[107] BiocManager_1.30.10               Rdpack_2.1.1                     
[109] snpStats_1.40.0                   data.table_1.14.0                
[111] bitops_1.0-6                      httpuv_1.5.5                     
[113] R6_2.5.0                          promises_1.2.0.1                 
[115] gridExtra_2.3                     rio_0.5.26                       
[117] lambda.r_1.2.4                    MASS_7.3-53.1                    
[119] assertthat_0.2.1                  SummarizedExperiment_1.20.0      
[121] openssl_1.4.3                     DESeq2_1.30.1                    
[123] withr_2.4.1                       GenomicAlignments_1.26.0         
[125] Rsamtools_2.6.0                   GenomeInfoDbData_1.2.4           
[127] hms_1.0.0                         VennDiagram_1.6.20               
[129] grid_4.0.3                        tidyr_1.1.3                      
[131] carData_3.0-4                     BSgenome.Hsapiens.UCSC.hg19_1.4.3
[133] MatrixGenerics_1.2.1              Biobase_2.50.0                   
[135] shiny_1.6.0  

Please check my Macbook environment. Best, Jian-Guo

Aufiero commented 3 years ago

Everything seems ok.

it might be your GTF file, How many rows has your GTF file, after you run the function formatGTF? Check if there is this exon: chr16: 88698-88890

JianGuoZhou3 commented 3 years ago

Hi Aufiero, it seems GTF file is error... image I can't find this exon: chr16: 88698-88890... However, my gtf is based on v34 hg38.

gtf <- formatGTF("gencode.v34.annotation.gtf")
Aufiero commented 3 years ago

Indeed, your GTF file is the problem, maybe you did something and took a chunk and not the entire GTF file. Try to rerun the function formatGTF and check that the GTF is complete.

Aufiero commented 3 years ago

I could not find v34 hg38 and I used this version 37 hg38 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_37/gencode.v37.annotation.gtf.gz.

JianGuoZhou3 commented 3 years ago

It's seem worked. I will re-analysis those. Now gtf is right. image

JianGuoZhou3 commented 3 years ago

Finally, it's worked.

table(mergedBSJunctions$tool) 
# ce   ce,ns    ns 
# 72962 28734  7790 

image Thanks for your help.