Background
In IDT notation modified sequences represented as a plain strings with a combination of standard and modified monomers.
Standard monomer [s]<Base>[*] is nucleotides with the same configurations as supported in Ketcher.
Modified monomer /<pos><Identifier>/[*] could be recognized as one of the following:
Known nucleotide with defined submonomers (RNA preset) from Ketcher library
Known nucleotide with defined structure, but with undefined submonomers (unsplit nucleotide) or CHEM.
CHEM with unknown structure (unresolved monomer)
This task covers only import of modified IDT monomers.
Requirements
The system should interpret the following /<pos><Identifier>/[*] as the IDT alias of monomer from Ketcher library and import corresponding monomer.
If there is no monomer with corresponding alias in a library, then system should Import IDT monomer as monomer with IDT alias only (no structure)
The system should check the position of the monomer in a chain according to pos in IDT alias:
5- at the 5' end (the first monomer in a chain)
i- inside the chain
3 - at the 3' end (the last monomer in a chain)
In case if position indicator in IDT code contradicts real position of the monomer in the chain, this should be treated as format error and import should fail with appropriate error message:
IDT alias \<IDT id> cannot be used at five prime
(was Position of monomer \<IDT id> in sequence contradicts its code but decided to change - approved by @olganaz)
When * is implied to modified IDT monomer, system should check also whether RNA preset with IDT alias /<pos><Identifier>/ exists in a library
if there is an RNA preset with IDT alias /<pos><Identifier>/ then /<pos><Identifier>/* should be imported as RNA preset, in which phosphate (P) is changed to Phosphorothioate (sP)
The bonds between monomers should be established from R2 attachment point of the first monomer to R1 attachment point of the second monomer.
Background In IDT notation modified sequences represented as a plain strings with a combination of standard and modified monomers. Standard monomer
[s]<Base>[*]
is nucleotides with the same configurations as supported in Ketcher. Modified monomer/<pos><Identifier>/[*]
could be recognized as one of the following:This task covers only import of modified IDT monomers.
Requirements
The system should interpret the following
/<pos><Identifier>/[*]
as the IDT alias of monomer from Ketcher library and import corresponding monomer.The system should check the position of the monomer in a chain according to
pos
in IDT alias:5
- at the 5' end (the first monomer in a chain)i
- inside the chain3
- at the 3' end (the last monomer in a chain)In case if position indicator in IDT code contradicts real position of the monomer in the chain, this should be treated as format error and import should fail with appropriate error message: IDT alias \<IDT id> cannot be used at five prime (was Position of monomer \<IDT id> in sequence contradicts its code but decided to change - approved by @olganaz)
When
*
is implied to modified IDT monomer, system should check also whether RNA preset with IDT alias/<pos><Identifier>/
exists in a library/<pos><Identifier>/
then/<pos><Identifier>/*
should be imported as RNA preset, in which phosphate (P) is changed to Phosphorothioate (sP)The bonds between monomers should be established from R2 attachment point of the first monomer to R1 attachment point of the second monomer.
Examples
/52MOErA/*/i2MOErC/*/32MOErT/
/5Phos/ACG/3Phos/