epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
315 stars 105 forks source link

Fasta: All Peptides should be saved to FASTA format #1822

Closed NadezhdaPeskun closed 6 months ago

NadezhdaPeskun commented 6 months ago

Steps to Reproduce

  1. Macromolecules mode
  2. Add one Peptide from Peptides tab
  3. Try to Save to FASTA format Not all Peptides Saved to FASTA

Current implementation for non-standard monomers save natural analog if present if no natural analog:

PEPTIDE: X RNA: N all other cases: monomer breaks the sequence

Expected behavior all Peptides should be saved to FASTA format. Expected result from comment: Sequences are expected to be represented in the standard nucleic acid codes, with these exceptions: N or n as unknown nucleic acid residue will be translated into X for unknown amino acid residue. The nucleic acid codes supported are: A --> adenosine C --> cytidine G --> guanine T --> thymidine N --> A G C T U (any)

Symbols for translated amino acids: A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine

Actual behavior Not all Peptides can be saved to FASTA format.

Environment details:Version 2.19.0-rc.2 Build at 2024-03-13; 13:51:33

Indigo Toolkit Version 1.19.0-dev.1.0-g9fa8cfe30-wasm32-wasm-clang-12.0.0

Win10 Chrome Version 122.0.6261.129 (Official Build) (64-bit)

AlexeyGirin commented 6 months ago

@even1024 says it is fixed.

NadezhdaPeskun commented 6 months ago

[3/19/2024 1:23 PM] Olga Nazarenko Sequences are expected to be represented in the standard nucleic acid codes, with these exceptions: N or n as unknown nucleic acid residue will be translated into X for unknown amino acid residue. The nucleic acid codes supported are: A --> adenosine C --> cytidine G --> guanine T --> thymidine N --> A G C T (any)

Symbols for translated amino acids: A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine

NadezhdaPeskun commented 6 months ago

Sequences are expected to be represented in the standard nucleic acid codes, with these exceptions: N or n as unknown nucleic acid residue will be translated into X for unknown amino acid residue. It will be implemented latter. New bugs were added

This issue verified as fixed. Version 2.20.0-rc.2 Build at 2024-03-27; 08:25:13 Indigo Toolkit Version 1.19.0-rc.2.0-g6c0e3fecf-wasm32-wasm-clang-12.0.0