Closed laurapspector closed 11 months ago
If your using the API could you provide a minimum example that produces the error?
sample_seq1.fasta looks like:
>sample_seq1 GYTFTDQT
fin = open('sample_seq1.fasta')
seqs = read_seq_data(fin)
logodata = LogoData.from_seqs(seqs)
logooptions = LogoOptions()
logooptions.alphabet = seq.Alphabet("ACDEFGHIKLMNOPQRSTUVWYBJZX*-",
tuple(zip("acdefghiklmnopqrstuvwybjzx?.~", "ACDEFGHIKLMNOPQRSTUVWYBJZXX--")),
)
logoformat = LogoFormat(logodata, logooptions)
eps = eps_formatter(logodata, logoformat)
Try adding the alphabet argument to read_seq_data(). I suspect that read_seq_data() is incorrectly guessing the alphabet, and then LogoData.from_seqs() ignores the non-DNA letters.
Thanks, this works initially, but when I opt to use the 'chemistry' color scheme
logooptions.color_scheme = std_color_schemes['chemistry'],
I run into the error
KeyError: "Colored symbol 'Z' does not exist in alphabet."
I would like to have at least X, *, and - in my alphabet. In the web version these default to black because they are not part of the 'chemistry' color scheme. How can I achieve that in the API?
Try also setting your custom alphabet on the color scheme. Something like...
cs = std_color_schemes['chemistry']
cs.alphabet = my_alphabet
logooptions.color_scheme = cs
Symbols not in the chemistry color scheme will default to black.
That works, thanks!
I explicitly provide the protein alphabet, but in generating a weblogo from the amino acid sequence GYTFTDQT, it only plots the G and T residues. Usually I select "protein" for Sequence type in the web version because I've noticed that it will not plot any characters that fall outside the set of ambiguous DNA characters if I do not, but I don't see that option for the API.
(Pdb) logooptions.alphabet
Alphabet( 'ACDEFGHIKLMNOPQRSTUVWYBJZX*-', zip('acdefghiklmnopqrstuvwybjzx?.~', 'ACDEFGHIKLMNOPQRSTUVWYBJZXX--') )