epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
318 stars 104 forks source link

Line breaks should be ignored for three letter sequence codes #2617

Closed AlexeyGirin closed 2 weeks ago

AlexeyGirin commented 2 weeks ago

Steps to Reproduce

  1. Go to Macro - Flex mode
  2. Load using Paste from clipboard way:
    AlaAl
    aCysCys

    image

Actual behavior System throws an error: Unsupported symbols: Convert error! Given string could not be loaded as (query or plain) molecule or reaction, see the error messages image

Expected behavior Content should be loaded correct (line breaks should be ignored): image

As per requirement:

1.6. Within one sequence every n*3+1 letter symbol has to be uppercase.

AlaAlaCysCys is valid AlaAl aCysCys is valid (ignoring line breaks - requirement 1.4) AlaAla CysCys is valid (two sequences - requirement 1.5) AlAalaCysCys is not valid (third letter is uppercase and the fourth one is not)

Versions

Found while testing - https://github.com/epam/ketcher/issues/5556, https://github.com/epam/Indigo/issues/2472

AlexeyGirin commented 1 week ago

Verified. image