alexchwong / SpliceWiz

SpliceWiz is an R package for exploring differential alternative splicing events in splice-aware alignment BAM files.
Other
13 stars 8 forks source link

Error while using buildRef() for Arabidopsis thaliana #59

Closed Desertodunas closed 8 months ago

Desertodunas commented 9 months ago

Hi,

I was trying to build the reference for Arabidopsis. I downloaded both the fasta and gtf files from Ensembl Plants.

I get the following error:

buildRef(reference_path = "reference/", fasta = "reference_at/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa", gtf = "reference_at/Arabidopsis_thaliana.TAIR10.57.gtf", genome_type = "", ontologySpecies = "Arabidopsis thaliana" )

fev 02 14:50:43 Reference generated without non-polyA reference fev 02 14:50:43 Reference generated without Mappability reference fev 02 14:50:43 Reference generated without Blacklist exclusion fev 02 14:50:43 Converting FASTA to local TwoBitFile...Error in .local(object, con, format, ...) : One or more strings contain unsupported ambiguity characters. Strings can contain only A, C, G, T or N. See Biostrings::replaceAmbiguities().

Any suggestions? Didn't have any issues using both the fasta and the gtf with other packages...

Thank you!

alexchwong commented 8 months ago

Hi @Desertodunas ,

It seems that the genome sequence contains ambiguities:

genome <- rtracklayer::import("reference_at/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa", "fasta")
Biostrings::uniqueLetters(genome)
 [1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "D" "N"

To resolve this, I suggest you replace these ambiguity letters as follows:

genome_fixed <- Biostrings::replaceAmbiguities(genome)

# save fixed genome to file
rtracklayer::export(genome_fixed, "reference_at/genome_fixed.fa", "fasta")

The fixed genome should be used in place of the original fasta in buildRef()

buildRef(reference_path = "reference/",
    fasta = "reference_at/genome_fixed.fa",
    gtf = "reference_at/Arabidopsis_thaliana.TAIR10.57.gtf",
    genome_type = "",
    ontologySpecies = "Arabidopsis thaliana"
)

Hope this works

Desertodunas commented 8 months ago

Hi!

That worked!

But now I have another issue related with the gtf file...

I've tried using a gtf and a gff3 file and both give the same error.

fev 12 10:55:28 Processing gtf file... ...genes ...transcripts ...CDS Error: fev 12 10:55:29 No start / stop codons detected in reference!

Thank you!

alexchwong commented 8 months ago

Can I confirm whether you are using the latest SpliceWiz? This issue should be fixed in release 1.4.1 and devel 1.5.2

Desertodunas commented 8 months ago

Yes, I'm using version 1.4.1

alexchwong commented 8 months ago

I cannot reproduce using 1.4.1; however, gene ontology reference for "Arabidopsis thaliana" doesn't work because orgDB for this species is missing the ensembl table. I am implementing a fix for the next version of SpliceWiz.

For now, I suggest re-running the above from scratch using SpliceWiz 1.4.1, and omit ontologySpecies argument.

Desertodunas commented 8 months ago

Hey,

I still get the "No start / stop codons detected in reference!", using version 1.4.1 and starting from scratch.

Thanks for the help anyway, I'll try to make this work another time.