ggmsa plot error due to unique names

YuLab-SMU / ggmsa

:traffic_light: Visualizing publication-quality multiple sequence alignment using ggplot2

http://yulab-smu.top/ggmsa

202 stars 23 forks source link

ggmsa plot error due to unique names #25

Open akjrobijns opened 2 years ago

akjrobijns commented 2 years ago

Hello, I'm trying to use ggmsa to plot an amino acid sequence alignment. I've aligned it using the msa() function and then converted it to an AAbin type file to then use with ggmsa().

When I try to make the ggmsa plot like this:

ggmsa(AA_alignment1_cv, 120, 220, color = "Clustal", font = "DroidSansMono", char_width = 0.5)

I get this error:

Error in tidy_msa(msa, start = start, end = end) : Sequences must have unique names

My sequences do all have different names? (I am using gene IDs in many cases) What counts as a 'unique name'? For example are these not unique because they start the same? "Alyli.0014s0106" and "Alyli.0091s0126"

Thanks!

nyzhoulang commented 2 years ago

Hi @akjrobijns,

unique names request that there is no duplication for full sequence name, and the same prefix "Alyli" is allowed.

For example:

[x] "Alyli.0014s0106" and "Alyli.0091s0126" is allowed (√)
[ ] "Alyli.0014s0106" and "Alyli.0014s0106" is wrong (×）

you can check whether including duplicate sequence names in alignment :

n <- names(AA_alignment1_cv)
dup <- n[duplicated(n)] 
dup

Thanks, Lang

rvazqf commented 2 years ago

Hi,

I'm struggling with the same issue here, trying to plot an MSA generated with msa().

The output of this function is an MsaAAMultipleAlignment object that is apparently not understood by ggmsa(). My solution was the same as explained here: using msa::msaConvert(x, type="ape::AAbin") to input an AAbin object into ggmsa().

So far I have not succeeded and I get the same error message as indicated above:

Error in tidy_msa(msa, start = start, end = end) : Sequences must have unique names

I've checked and there are definitely no duplicated IDs in my dataset, although I must say that names(alignment) returns NULL -the path to get the names from my AAbin alignment should be labels(alignment).

Here's my code if it is of any help:

library(msa)
library(ggmsa)

seqs <- readAAStringSet("data.fasta")
alignment <- msa(seqs, method="ClustalOmega")
alignment <- msaConvert(alignment, type="ape::AAbin")
ggmsa(alignment, char_width = 0.5) + geom_seqlogo() + geom_msaBar()

maxfieldk commented 1 year ago

Hello, I am having the same issue. Any help would be greatly appreciated!

defneyanartas commented 4 months ago

I was having the same issue. I worked around it by using on of the other acceptable msa input formats.

# read and align
sequences <- readAAStringSet("data/sample.faa")

# Perform MSA using ClustalOmega
alignment <- msa(sequences, "ClustalOmega")

# write the alignment to a fasta file
writeXStringSet(as(unmasked(alignment), "XStringSet"), file="alignment.fasta")

# read as AAstringset
AAStringSet <- readAAStringSet("alignment.fasta")

# visualize
ggmsa(AAStringSet) + geom_seqlogo() + geom_msaBar()