Closed mabinjw closed 12 months ago
Hi there. What version of SpliceWiz are you using?
I am running version 1.2.3 from a recent BiocManager install.
Seems like theres an extra comma in the SpliceWiz source that is turning a warning message into an error. I will fix this when Bioc 3.18 is reopened/stable in coming days.
In the meantime, could you use an unfiltered and original gtf file from Ensembl?
I have run SpliceWiz with an Ensembl annotation file and it runs great. SpliceWiz just requires annotated CDS, start and stop codons, and protein ids, which the CHESS annotations and StringTie gtfs files do not have. I am running the StringTie annotations through ORF prediction and adding start and stop codon annotations, but it still requires protein_ids, of which I do not have.
I have fixed the error message in the main branch. Could you install this using devtools and give your custom GTFs a go? Let me know if you run into any further problems, thanks
Thank you for looking into this. I tried to install SpliceWiz (v1.3.5) using:
install_github("alexchwong/SpliceWiz", "main", dependencies=TRUE)
I still get the Error in get("protein_id") : object 'protein_id' not found.
I had another question, if the CDS is defined in the annotation file, is it necessary to require the start and stop codon position too?
Can I have the full error log please?
> buildRef(
+ reference_path = ref_path,
+ fasta = "/gpfs/gsfs12/users/mabinjw/annotations/hg38.fa",
+ gtf = "/gpfs/gsfs12/users/mabinjw/annotations/CHESS_stringtie_factR_CDS.gtf",
+ genome_type = "hg38",
+ ontologySpecies = "Homo sapiens",
+ useExtendedTranscripts = TRUE
+ )
Oct 25 11:25:43 AM Reference generated without Blacklist exclusion Oct 25 11:25:45 AM Converting FASTA to local TwoBitFile...done Oct 25 11:26:54 AM Connecting to genome TwoBitFile...done Oct 25 11:26:55 AM Making local copy of GTF file...done Oct 25 11:26:59 AM Reading source GTF file...done Oct 25 11:27:11 AM Processing gtf file... ...genes ...transcripts ...CDS ...exons done snapshotDate(): 2023-04-25 Oct 25 11:27:19 AM Retrieving gene GO-term pairings Oct 25 11:27:22 AM Retrieving GO terms from GO.db Oct 25 11:27:46 AM Processing introns... ...data ...basic annotations ...splice motifs ...other info ...defining flanking exon clusters done Oct 25 11:28:10 AM Generating processBAM reference ...prepping data ...determining measurable introns (directional) ...determining measurable introns (non-directional) ...writing ref-cover.bed ...writing ref-ROI.bed ...writing ref-read-continues.ref ...writing ref-sj.ref ...writing ref-tj.ref processBAM reference generated Oct 25 11:29:02 AM Predicting NMD transcripts from genome sequence ...exonic transcripts ...retained introns |=============================================================================================================================| 100% done Oct 25 11:29:45 AM Annotating Splice Events Annotating Mutually-Exclusive-Exon Splice Events...done Annotating Skipped-Exon Splice Events...done Annotating Alternate 5' / 3' Splice Site Splice Events...done Annotating Alternate First / Last Exon Splice Events...done Annotating known retained introns...done Error in get("protein_id") : object 'protein_id' not found
same issue with both versions 1.4 and 1.5 :
Error in get("protein_id") : object 'protein_id' not found
packageVersion("SpliceWiz")
[1] ‘1.4.0’
packageVersion("SpliceWiz")
[1] ‘1.5.0’
@mabinjw please try installing the new commit with: devtools::install("alexchwong/SpliceWiz", "ef2b20586d0876d8b411172959bdf12a7489d584")
I get a different error now, but I am able get through up to the 'Collate the experiment' section, then it throws another error. Both of the run outputs are shown below:
buildRef(
+ reference_path = "/gpfs/gsfs12/users/mabinjw/SpliceWiz",
+ fasta = "/gpfs/gsfs12/users/mabinjw/annotations/hg38.fasta",
+ gtf = "/gpfs/gsfs12/users/mabinjw/annotations/CHESS_stringtie_factR_CDS.gtf")
Oct 27 12:02:04 PM Reference generated without non-polyA reference
Oct 27 12:02:04 PM Reference generated without Mappability reference
Oct 27 12:02:04 PM Reference generated without Blacklist exclusion
Oct 27 12:02:04 PM Connecting to genome TwoBitFile...done
Oct 27 12:02:04 PM Reading source GTF file...done
Oct 27 12:02:16 PM Processing gtf file...
...genes
...transcripts
...CDS
...exons
done
Oct 27 12:02:21 PM Gene ontology not prepared for this reference
Oct 27 12:02:26 PM Processing introns...
...data
...basic annotations
...splice motifs
...other info
...defining flanking exon clusters
done
Oct 27 12:02:48 PM Generating processBAM reference
...prepping data
...determining measurable introns (directional)
...determining measurable introns (non-directional)
...writing ref-cover.bed
...writing ref-ROI.bed
...writing ref-read-continues.ref
...writing ref-sj.ref
...writing ref-tj.ref
processBAM reference generated
Oct 27 12:03:35 PM Predicting NMD transcripts from genome sequence
...exonic transcripts
...retained introns
|=============================================================================================================================| 100%
done
Oct 27 12:04:16 PM Annotating Splice Events
Annotating Mutually-Exclusive-Exon Splice Events...done
Annotating Skipped-Exon Splice Events...done
Annotating Alternate 5' / 3' Splice Site Splice Events...done
Annotating Alternate First / Last Exon Splice Events...done
Annotating known retained introns...done
Oct 27 12:04:35 PM Splice Annotations Filtered
Oct 27 12:04:37 PM Translating Alternate Splice Peptides...done
Oct 27 12:04:37 PM Splice Annotations finished
Error in read_fst(path, columns, from, to, as.data.table, old_format) :
Column 'exon_id' not found
collateData(
+ Experiment = expr, # Unique sample names
+ reference_path = ref_path, # The directory containing the SpliceWiz reference
+ output_path = nxtse_path, # The directory where the output of processBAM() should go
+ novelSplicing = TRUE, # switches on novel splice detection
+ novelSplicing_requireOneAnnotatedSJ = TRUE, # novel junctions must share one annotated splice site
+ novelSplicing_minSamples = 6, # retain junctions observed in 3+ samples (of any non-zero expression)
+ novelSplicing_minSamplesAboveThreshold = 3, # only 1 sample required if its junction count exceeds a set threshold
+ novelSplicing_countThreshold = 10, # threshold for previous parameter
+ novelSplicing_useTJ = TRUE # whether tandem junction reads should be used to identify novel exons
+ )
Oct 27 01:38:46 PM Using MulticoreParam 1 threads
Oct 27 01:38:46 PM Validating Experiment; checking COV files...
Oct 27 01:38:46 PM Compiling Sample Stats
Oct 27 01:38:46 PM Compiling Junction List...merging...done
Oct 27 01:39:06 PM Compiling Junction Stats...merging...done
Oct 27 01:39:30 PM Compiling Intron Retention List...done
Oct 27 01:40:11 PM Compiling Tandem Junction List...merging...done
Oct 27 01:40:25 PM Tidying up splice junctions and intron retentions...
...annotating splice junctions
...looking for novel exons
Oct 27 01:40:42 PM Assembling novel splicing reference:
...loading reference FASTA/GTF
...injecting novel transcripts to GTF
...processing GTF
...processing introns from GTF
...annotating alternative splicing events
done
Oct 27 01:42:01 PM Tidying up splice junctions and intron retentions (part 2)...
...grouping splice junctions
...grouping introns
...loading splice events
...compiling rowEvents
done
Oct 27 01:42:43 PM Generating NxtSE assays
Oct 27 01:42:45 PM Using MulticoreParam 1 threads
|=============================================================================================================================| 100%
Oct 27 01:46:33 PM Building Final NxtSE Object
Oct 27 01:46:33 PM ...consolidating assays to H5 file
finalising H5 database [============================================================] 100% eta: 0s
Oct 27 01:47:08 PM ...packaging reference
Error in .loadViewRef(use_ref_path) : object 'total.DT' not found
Hello,
I am trying to generate reference files, but the continually get an error no matter which annotation file I use (Ensembl annotations included):
buildRef( reference_path = ref_path, fasta = "hg38.fa", gtf = "CHESS_TranscriptsCleaned.gtf", genome_type = "hg38", ontologySpecies = "Homo sapiens" )
Oct 23 01:57:16 PM Reference generated without Blacklist exclusion Oct 23 01:57:18 PM Connecting to genome TwoBitFile...done Oct 23 01:57:18 PM Reading source GTF file...done Oct 23 01:57:38 PM Processing gtf file... ...genes ...transcripts ...CDS Error: Oct 23 01:57:39 PM No start / stop codons detected in reference!
Thank you for your help!