Open akjrobijns opened 2 years ago
Hi @akjrobijns,
unique names request that there is no duplication for full sequence name, and the same prefix "Alyli" is allowed.
For example:
you can check whether including duplicate sequence names in alignment :
n <- names(AA_alignment1_cv)
dup <- n[duplicated(n)]
dup
Thanks, Lang
Hi,
I'm struggling with the same issue here, trying to plot an MSA generated with msa().
The output of this function is an MsaAAMultipleAlignment object that is apparently not understood by ggmsa(). My solution was the same as explained here: using msa::msaConvert(x, type="ape::AAbin") to input an AAbin object into ggmsa().
So far I have not succeeded and I get the same error message as indicated above:
Error in tidy_msa(msa, start = start, end = end) : Sequences must have unique names
I've checked and there are definitely no duplicated IDs in my dataset, although I must say that names(alignment) returns NULL -the path to get the names from my AAbin alignment should be labels(alignment).
Here's my code if it is of any help:
library(msa)
library(ggmsa)
seqs <- readAAStringSet("data.fasta")
alignment <- msa(seqs, method="ClustalOmega")
alignment <- msaConvert(alignment, type="ape::AAbin")
ggmsa(alignment, char_width = 0.5) + geom_seqlogo() + geom_msaBar()
Hello, I am having the same issue. Any help would be greatly appreciated!
I was having the same issue. I worked around it by using on of the other acceptable msa input formats.
# read and align
sequences <- readAAStringSet("data/sample.faa")
# Perform MSA using ClustalOmega
alignment <- msa(sequences, "ClustalOmega")
# write the alignment to a fasta file
writeXStringSet(as(unmasked(alignment), "XStringSet"), file="alignment.fasta")
# read as AAstringset
AAStringSet <- readAAStringSet("alignment.fasta")
# visualize
ggmsa(AAStringSet) + geom_seqlogo() + geom_msaBar()
Hello, I'm trying to use ggmsa to plot an amino acid sequence alignment. I've aligned it using the msa() function and then converted it to an AAbin type file to then use with ggmsa().
When I try to make the ggmsa plot like this:
I get this error:
My sequences do all have different names? (I am using gene IDs in many cases) What counts as a 'unique name'? For example are these not unique because they start the same? "Alyli.0014s0106" and "Alyli.0091s0126"
Thanks!