gecrooks / weblogo

WebLogo 3: Sequence Logos redrawn
weblogo.threeplusone.com
Other
146 stars 39 forks source link

Amino acid residues not plotted #164

Closed laurapspector closed 11 months ago

laurapspector commented 11 months ago

I explicitly provide the protein alphabet, but in generating a weblogo from the amino acid sequence GYTFTDQT, it only plots the G and T residues. Usually I select "protein" for Sequence type in the web version because I've noticed that it will not plot any characters that fall outside the set of ambiguous DNA characters if I do not, but I don't see that option for the API. (Pdb) logooptions.alphabet Alphabet( 'ACDEFGHIKLMNOPQRSTUVWYBJZX*-', zip('acdefghiklmnopqrstuvwybjzx?.~', 'ACDEFGHIKLMNOPQRSTUVWYBJZXX--') ) Screenshot 2023-11-06 at 11 54 27 AM

gecrooks commented 11 months ago

If your using the API could you provide a minimum example that produces the error?

laurapspector commented 11 months ago

sample_seq1.fasta looks like:

>sample_seq1 GYTFTDQT

fin = open('sample_seq1.fasta')
seqs = read_seq_data(fin)
logodata = LogoData.from_seqs(seqs)
logooptions = LogoOptions()
logooptions.alphabet = seq.Alphabet("ACDEFGHIKLMNOPQRSTUVWYBJZX*-",
    tuple(zip("acdefghiklmnopqrstuvwybjzx?.~", "ACDEFGHIKLMNOPQRSTUVWYBJZXX--")),
    )
logoformat = LogoFormat(logodata, logooptions)
eps = eps_formatter(logodata, logoformat)
gecrooks commented 11 months ago

Try adding the alphabet argument to read_seq_data(). I suspect that read_seq_data() is incorrectly guessing the alphabet, and then LogoData.from_seqs() ignores the non-DNA letters.

laurapspector commented 11 months ago

Thanks, this works initially, but when I opt to use the 'chemistry' color scheme logooptions.color_scheme = std_color_schemes['chemistry'], I run into the error KeyError: "Colored symbol 'Z' does not exist in alphabet." I would like to have at least X, *, and - in my alphabet. In the web version these default to black because they are not part of the 'chemistry' color scheme. How can I achieve that in the API?

gecrooks commented 11 months ago

Try also setting your custom alphabet on the color scheme. Something like...

cs = std_color_schemes['chemistry']
cs.alphabet = my_alphabet
logooptions.color_scheme = cs

Symbols not in the chemistry color scheme will default to black.

laurapspector commented 11 months ago

That works, thanks!