Closed Desertodunas closed 8 months ago
Hi @Desertodunas ,
It seems that the genome sequence contains ambiguities:
genome <- rtracklayer::import("reference_at/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa", "fasta")
Biostrings::uniqueLetters(genome)
[1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "D" "N"
To resolve this, I suggest you replace these ambiguity letters as follows:
genome_fixed <- Biostrings::replaceAmbiguities(genome)
# save fixed genome to file
rtracklayer::export(genome_fixed, "reference_at/genome_fixed.fa", "fasta")
The fixed genome should be used in place of the original fasta in buildRef()
buildRef(reference_path = "reference/",
fasta = "reference_at/genome_fixed.fa",
gtf = "reference_at/Arabidopsis_thaliana.TAIR10.57.gtf",
genome_type = "",
ontologySpecies = "Arabidopsis thaliana"
)
Hope this works
Hi!
That worked!
But now I have another issue related with the gtf file...
I've tried using a gtf and a gff3 file and both give the same error.
fev 12 10:55:28 Processing gtf file... ...genes ...transcripts ...CDS Error: fev 12 10:55:29 No start / stop codons detected in reference!
Thank you!
Can I confirm whether you are using the latest SpliceWiz? This issue should be fixed in release 1.4.1 and devel 1.5.2
Yes, I'm using version 1.4.1
I cannot reproduce using 1.4.1; however, gene ontology reference for "Arabidopsis thaliana"
doesn't work because orgDB for this species is missing the ensembl
table. I am implementing a fix for the next version of SpliceWiz.
For now, I suggest re-running the above from scratch using SpliceWiz 1.4.1, and omit ontologySpecies
argument.
Hey,
I still get the "No start / stop codons detected in reference!", using version 1.4.1 and starting from scratch.
Thanks for the help anyway, I'll try to make this work another time.
Hi,
I was trying to build the reference for Arabidopsis. I downloaded both the fasta and gtf files from Ensembl Plants.
I get the following error:
buildRef(reference_path = "reference/", fasta = "reference_at/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa", gtf = "reference_at/Arabidopsis_thaliana.TAIR10.57.gtf", genome_type = "", ontologySpecies = "Arabidopsis thaliana" )
fev 02 14:50:43 Reference generated without non-polyA reference fev 02 14:50:43 Reference generated without Mappability reference fev 02 14:50:43 Reference generated without Blacklist exclusion fev 02 14:50:43 Converting FASTA to local TwoBitFile...Error in .local(object, con, format, ...) : One or more strings contain unsupported ambiguity characters. Strings can contain only A, C, G, T or N. See Biostrings::replaceAmbiguities().
Any suggestions? Didn't have any issues using both the fasta and the gtf with other packages...
Thank you!