alexchwong / SpliceWiz

SpliceWiz is an R package for exploring differential alternative splicing events in splice-aware alignment BAM files.
Other
13 stars 7 forks source link

Error: could not find function "getCoverageData #46

Closed michael-mazzucco closed 1 year ago

michael-mazzucco commented 1 year ago

Hello! I am unfortunately struggling with the getCoverageData part of the tutorial. Any input would be appreciated:

library(SpliceWiz)
#> Loading required package: NxtIRFdata
#> SpliceWiz package loaded with 1 threads
#> Use setSWthreads() to set the number of SpliceWiz threads
library(tidyverse)
library(reprex)
library(edgeR)
#> Warning: package 'edgeR' was built under R version 4.2.2
#> Loading required package: limma
#> Warning: package 'limma' was built under R version 4.2.2
library(ggplot2)
library(pheatmap)

ref_path <- file.path(tempdir(), "Reference")

buildRef(
  reference_path = ref_path,
  fasta = chrZ_genome(),
  gtf = chrZ_gtf(),
  ontologySpecies = "Homo sapiens"
)
#> Jul 24 18:18:45 Reference generated without non-polyA reference
#> Jul 24 18:18:45 Reference generated without Mappability reference
#> Jul 24 18:18:45 Reference generated without Blacklist exclusion
#> Jul 24 18:18:45 Converting FASTA to local TwoBitFile...done
#> Jul 24 18:18:45 Connecting to genome TwoBitFile...done
#> Jul 24 18:18:45 Making local copy of GTF file...done
#> Jul 24 18:18:45 Reading source GTF file...done
#> Jul 24 18:18:46 Processing gtf file...
#> ...genes
#> ...transcripts
#> ...CDS
#> ...exons
#> done
#> snapshotDate(): 2022-10-31
#> Jul 24 18:18:50 Retrieving gene GO-term pairings
#> Jul 24 18:18:54 Retrieving GO terms from GO.db
#> 
#> Jul 24 18:19:10 Processing introns...
#> ...data
#> ...basic annotations
#> ...splice motifs
#> ...other info
#> ...defining flanking exon clusters
#> done
#> Jul 24 18:19:12 Generating processBAM reference
#> ...prepping data
#> ...determining measurable introns (directional)
#> ...determining measurable introns (non-directional)
#> ...writing ref-cover.bed
#> ...writing ref-ROI.bed
#> ...writing ref-read-continues.ref
#> ...writing ref-sj.ref
#> ...writing ref-tj.ref
#> processBAM reference generated
#> Jul 24 18:19:19 Predicting NMD transcripts from genome sequence
#> ...exonic transcripts
#> ...retained introns
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> done
#> Jul 24 18:19:20 Annotating Splice Events
#> Annotating Mutually-Exclusive-Exon Splice Events...done
#> Annotating Skipped-Exon Splice Events...done
#> Annotating Alternate 5' / 3' Splice Site Splice Events...done
#> Annotating Alternate First / Last Exon Splice Events...done
#> Annotating known retained introns...done
#> Jul 24 18:19:21 Splice Annotations Filtered
#> Jul 24 18:19:21 Translating Alternate Splice Peptides...done
#> Jul 24 18:19:22 Splice Annotations finished
#> 
#> Reference build finished

df <- viewASE(ref_path)

bams <- SpliceWiz_example_bams()

pb_path <- file.path(tempdir(), "pb_output")
processBAM(
  bamfiles = bams$path,
  sample_names = bams$sample,
  reference_path = ref_path,
  output_path = pb_path
)
#> Jul 24 18:19:25 Running SpliceWiz processBAM
#> Jul 24 18:19:25 Using MulticoreParam 1 threads
#> Allocating memory to 1 threads for SpliceWiz (ompBAM)...initialized (0 milliseconds)
#> /private/var/folders/1y/fm2jvsw575dggg5_2dwr5q4c0000gn/T/RtmpF46oyF/02H003.bam processed (344 milliseconds)
#> Jul 24 18:19:25 SpliceWiz processBAM processed RtmpF46oyF/02H003.bam
#> Allocating memory to 1 threads for SpliceWiz (ompBAM)...initialized (0 milliseconds)
#> /private/var/folders/1y/fm2jvsw575dggg5_2dwr5q4c0000gn/T/RtmpF46oyF/02H025.bam processed (289 milliseconds)
#> Jul 24 18:19:26 SpliceWiz processBAM processed RtmpF46oyF/02H025.bam
#> Allocating memory to 1 threads for SpliceWiz (ompBAM)...initialized (0 milliseconds)
#> /private/var/folders/1y/fm2jvsw575dggg5_2dwr5q4c0000gn/T/RtmpF46oyF/02H026.bam processed (424 milliseconds)
#> Jul 24 18:19:26 SpliceWiz processBAM processed RtmpF46oyF/02H026.bam
#> Allocating memory to 1 threads for SpliceWiz (ompBAM)...initialized (0 milliseconds)
#> /private/var/folders/1y/fm2jvsw575dggg5_2dwr5q4c0000gn/T/RtmpF46oyF/02H033.bam processed (400 milliseconds)
#> Jul 24 18:19:27 SpliceWiz processBAM processed RtmpF46oyF/02H033.bam
#> Allocating memory to 1 threads for SpliceWiz (ompBAM)...initialized (0 milliseconds)
#> /private/var/folders/1y/fm2jvsw575dggg5_2dwr5q4c0000gn/T/RtmpF46oyF/02H043.bam processed (242 milliseconds)
#> Jul 24 18:19:27 SpliceWiz processBAM processed RtmpF46oyF/02H043.bam
#> Allocating memory to 1 threads for SpliceWiz (ompBAM)...initialized (0 milliseconds)
#> /private/var/folders/1y/fm2jvsw575dggg5_2dwr5q4c0000gn/T/RtmpF46oyF/02H046.bam processed (381 milliseconds)
#> Jul 24 18:19:27 SpliceWiz processBAM processed RtmpF46oyF/02H046.bam

expr <- findSpliceWizOutput(pb_path)

nxtse_path <- file.path(tempdir(), "NxtSE_output")
collateData(
  Experiment = expr,
  reference_path = ref_path,
  output_path = nxtse_path
)
#> Jul 24 18:19:27 Using MulticoreParam 1 threads
#> Jul 24 18:19:27 Validating Experiment; checking COV files...
#> Jul 24 18:19:27 Compiling Sample Stats
#> Jul 24 18:19:27 Compiling Junction List...merging...done
#> Jul 24 18:19:28 Compiling Junction Stats...merging...done
#> Jul 24 18:19:32 Compiling Intron Retention List...done
#> Jul 24 18:19:33 Tidying up splice junctions and intron retentions...
#> ...annotating splice junctions
#> ...copying splicing reference
#> ...grouping splice junctions
#> ...grouping introns
#> ...loading splice events
#> ...compiling rowEvents
#> done
#> 
#> Jul 24 18:19:39 Generating NxtSE assays
#> Jul 24 18:19:39 Using MulticoreParam 1 threads
#>   |                                                                              |                                                                      |   0%  |                                                                              |============                                                          |  17%  |                                                                              |=======================                                               |  33%  |                                                                              |===================================                                   |  50%  |                                                                              |===============================================                       |  67%  |                                                                              |==========================================================            |  83%  |                                                                              |======================================================================| 100%
#> Jul 24 18:19:47 Saving assays into H5 file
#> Jul 24 18:19:48 Saving auxiliary data
#> Jul 24 18:19:49 Assembling final NxtSE object
#> Jul 24 18:19:49 Saving final NxtSE object
#> Jul 24 18:19:49 Calculating overlapping IR-events for whole dataset
#> Jul 24 18:19:53 SpliceWiz (NxtSE) Collation Finished

se <- makeSE(nxtse_path)
#> Jul 24 18:19:53 Loading NxtSE object from file...rowData...done
#> Jul 24 18:19:54 ...removing overlapping introns...done
#> 
#> Jul 24 18:19:54 NxtSE loaded

colData(se)$condition <- rep(c("A", "B"), each = 3)
colData(se)$batch <- rep(c("K", "L", "M"), 2)

se.filtered <- se[applyFilters(se),]
#> Running Depth filter
#> Running Participation filter
#> Running Participation filter
#> Running Consistency filter
#> Running Terminus filter
#> Running ExclusiveMXE filter
#> Running StrictAltSS filter

# Requires edgeR to be installed:
require("edgeR")
res_edgeR <- ASE_edgeR(
  se = se.filtered,
  test_factor = "condition",
  test_nom = "B",
  test_denom = "A"
)
#> Jul 24 18:19:57 Performing edgeR contrast for included / excluded counts separately
#> Jul 24 18:19:58 Performing edgeR contrast for included / excluded counts together

ggplot(res_edgeR,
       aes(x = logFC, y = -log10(FDR))) + 
  geom_point() +
  labs(title = "Differential analysis - B vs A",
       x = "Log2-fold change", y = "FDR (-log10)")


ggplot(res_edgeR, aes(x = 100 * AvgPSI_B, y = 100 * AvgPSI_A)) + 
  geom_point() + xlim(0, 100) + ylim(0, 100) +
  labs(title = "PSI values across conditions",
       x = "PSI of condition B", y = "PSI of condition A")
#> Warning: Removed 1 rows containing missing values (`geom_point()`).


getAvailableGO()
#> Cannot connect to AnnotationHub server, using 'localHub=TRUE' instead
#> Using 'localHub=TRUE'
#>   If offline, please also see BiocManager vignette section on offline use
#> snapshotDate(): 2023-06-23
#> [1] "Homo sapiens"

ref_path <- file.path(tempdir(), "Reference")
ontology <- viewGO(ref_path)
head(ontology)
#>        go_id evidence ontology      ensembl_id                          go_term
#> 1 GO:0000002      TAS       BP ENSG00000151729 mitochondrial genome maintenance
#> 2 GO:0000002      IMP       BP ENSG00000025708 mitochondrial genome maintenance
#> 3 GO:0000002      ISS       BP ENSG00000068305 mitochondrial genome maintenance
#> 4 GO:0000002      IMP       BP ENSG00000115204 mitochondrial genome maintenance
#> 5 GO:0000002      IMP       BP ENSG00000198836 mitochondrial genome maintenance
#> 6 GO:0000002      NAS       BP ENSG00000196365 mitochondrial genome maintenance
#>   gene_name
#> 1      <NA>
#> 2      <NA>
#> 3      <NA>
#> 4      <NA>
#> 5      <NA>
#> 6      <NA>

allGenes <- sort(unique(ontology$ensembl_id))
exampleGeneID <- allGenes[1:1000]
exampleBkgdID <- allGenes

go_byGenes <- goGenes(
  enrichedGenes = exampleGeneID, 
  universeGenes = exampleBkgdID, 
  ontologyRef = ontology
)

plotGO(go_byGenes, filter_n_terms = 12)


res_edgeR <- ASE_edgeR(
  se = se.filtered,
  test_factor = "condition",
  test_nom = "B",
  test_denom = "A"
)
#> Jul 24 18:28:27 Performing edgeR contrast for included / excluded counts separately
#> Jul 24 18:28:27 Performing edgeR contrast for included / excluded counts together

go_byASE <- goASE(
  enrichedEventNames = res_edgeR$EventName[1:10],
  universeEventNames = NULL,
  se = se
)

head(go_byASE)
#>         go_id                                                   go_term
#> 1: GO:0000381  regulation of alternative mRNA splicing, via spliceosome
#> 2: GO:0045892        negative regulation of DNA-templated transcription
#> 3: GO:0048026     positive regulation of mRNA splicing, via spliceosome
#> 4: GO:0000122 negative regulation of transcription by RNA polymerase II
#> 5: GO:0000375           RNA splicing, via transesterification reactions
#> 6: GO:0000423                                                 mitophagy
#>         pval      padj overlap size                    overlapGenes expected
#> 1: 0.4761905 0.7696177       2    2 ENSG00000161547,ENSG00000136527        2
#> 2: 0.4761905 0.7696177       2    2 ENSG00000161547,ENSG00000141510        2
#> 3: 0.4761905 0.7696177       2    2 ENSG00000136527,ENSG00000164548        2
#> 4: 0.7142857 0.7696177       1    1                 ENSG00000141510        1
#> 5: 0.7142857 0.7696177       1    1                 ENSG00000136527        1
#> 6: 0.7142857 0.7696177       1    1                 ENSG00000141510        1
#>    foldEnrichment
#> 1:              1
#> 2:              1
#> 3:              1
#> 4:              1
#> 5:              1
#> 6:              1

go_byASE <- goASE(
  enrichedEventNames = res_edgeR$EventName[1:10],
  universeEventNames = res_edgeR$EventName,
  se = se
)

mat <- makeMatrix(
  se.filtered,
  event_list = res_edgeR$EventName[1:10],
  method = "PSI"
)

anno_col_df <- as.data.frame(colData(se.filtered))
anno_col_df <- anno_col_df[, 1, drop=FALSE]

pheatmap(mat, annotation_col = anno_col_df)


res_edgeR <- ASE_edgeR(
  se = se.filtered,
  test_factor = "condition",
  test_nom = "B",
  test_denom = "A"
)
#> Jul 24 18:31:00 Performing edgeR contrast for included / excluded counts separately
#> Jul 24 18:31:01 Performing edgeR contrast for included / excluded counts together

res_edgeR.filtered <- res_edgeR[res_edgeR$abs_deltaPSI > 0.05,]
res_edgeR.filtered$EventName[1]
#> [1] NA

dataObj <- getCoverageData(se, Gene = "NSUN5", tracks = colnames(se))
#> Error in getCoverageData(se, Gene = "NSUN5", tracks = colnames(se)): could not find function "getCoverageData"

Created on 2023-07-24 with reprex v2.0.2

alexchwong commented 1 year ago

Hi Michael,

Please check the version of SpliceWiz. getCoverageData() was introduced from version 1.1.8 onwards (Apr 2023).

If installing via Bioconductor::install(), make sure you are using Bioc 3.17 (R 4.3). If you wish to remain on R 4.2 or below, please install SpliceWiz via devtools::install_github()

michael-mazzucco commented 1 year ago

Hi Alex,

That seems to be it. Keeps installing 1.1.6 citing issues with igraph, MatrixModels, and some other dependencies. Will keep at it to see if I can get it to 1.1.8. Thanks!

alexchwong commented 1 year ago

@michael-mazzucco What dependency issues do you have? I'm not aware of igraph and MatrixModels being direct dependencies; it may be secondary dependencies of which I am not aware.

What happens if you install via github? (I've updated the readme to reflect the latest version)

michael-mazzucco commented 1 year ago

The Github installation worked! Much appreciated, thank you for helping me through this.