epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
291 stars 100 forks source link

Export of modified RNA to IDT notation (modified IDT monomers) #1900

Closed olganaz closed 1 month ago

olganaz commented 3 months ago

Background Users may need to generate IDT notation for modified RNA to be able to order this kind of molecules through IDT web site.

Requirements

  1. Known modified monomers (monomers from Ketcher library). System should have IDT alias and structure for this kind of monomers. These monomers could be CHEMs or Nucleotides (split to submonomers OR unsplit single unit) For this kind of monomers IDT name should be exported in a form /<pos><Identifier>/[*] where pos - position of the nucleotide in the chain/fragment:
    • 5- at the 5' end (the first monomer in a chain)
    • i- inside the chain
    • 3 - at the 3' end (the last monomer in a chain)

Identifier- alphanumeric string representing unique identifier of the monomer in IDT registry. * - optional indicator of modified phosphate. If specified, indicates that Phosphorothioate (sP) is included into nucleotide, otherwise standard phosphate (P) is implied.

For monomers having multiple IDT names the appropriate name should be chosen depending on position of monomer inside the chain.

  1. Unknown modified monomers. System has IDT name without structure. This kind of monomers are unknown CHEMs with the IDT name, so the IDT name should be exported.
  2. Modified monomers without IDT name. If there is no IDT name for monomer, system should display Error message "This molecule has unsupported monomer and couldn't be exported to IDT notation".

Examples TBD

Zhirnoff commented 1 month ago

Tested. We have bugs mentioned above. Desktop:

Ketcher version [Version 2.22.0-rc.2] Indigo version [Version 1.21.0-rc.1]