glycoinfo / WURCS

Web3 Unique Representation of Carbohydrate Structures
https://www.wurcs-wg.org/
2 stars 0 forks source link

What substituents are suitable as a part of glycans? #1

Open MasaakiMatsubara opened 1 year ago

MasaakiMatsubara commented 1 year ago

WURCS has rule to represent various chemical structures.
However, it does not refer to what chemical structures are suitable as a part of glycans.
Especially for substituents, there is no clear rule for identifying itself in many cases, and thereby they become too much large and complicated.

Thus, in this issue, we discuss about how to filter unsuitable substituents.
Through this discussion, we also try to determine what substituents are suitable for a part of glycans.

MasaakiMatsubara commented 1 year ago

Structural features for defining glycan substituents

Because there are so many aspects to discuss about what structural features can be used for the substituent filtering, the discussion is also to be complicated.

Then, to simplify the discussion, we are focusing on the following structural features at this point:

  1. Weather non-organic atoms are contained or not
  2. The number of branches
  3. Maximum ring size

Weather non-organic atoms are contained or not

In general, chemical modifications make glycan structures diverse and complicated.
Thus we consider that chemical modifications should not be contained in the glycan substituents.
There are many discussions about "what are chemical modifications", but at this point, we defined that chemical modifications contain non-organic atoms.
Since there are some discussions about "organic atoms" as well, we also defined that the following elements are organic atoms:

These are basically elements contained in natural organic compounds.
Although some elements, e.g. F, can be considered to be a part of substituent because many compounds registered in chemical compound databases has them, they are used as some chemical modifications, e.g. labeling, in many cases.
Therefore, we limit organic elements more strictly.

The number of branches

This feature is for measuring complexity of the chemical structures, i.e. more branches, more complex.
Here, we consider that "branch" is an atom with three or more connections to heavy atoms. The number of branches is to be a number of the branched atoms, i.e. atom with three connections has a branch and one with four connections has two branches.
Note that the element of branched atom is not limited to carbon currently. Therefore, for example, P of phosphate and S of sulfate are also counted as branched atoms.
At this point, we determined that up to four branches are allowed in a substituent. This is considered from the list of substituents described in SNFG document to keep major substituents.
On the other hand, we dare not to use "the number of atoms" for filtering because some substituents can be large but have fewer branch, e.g. lipids.

Maximum ring size

This feature is to filter substituents with large ring.
Here, the "maximum ring size" means the ring size of a biggest ring of SSSR (smallest set of smallest rings).
This is to distinguish single rings from the fused rings. As we mentioned above, the number of branches is used for filtering as feature 2. This feature also filters polycyclic compounds because they have many branches, too. Therefore, we do not have to consider about ring size of polycyclic compounds.
Returning to the "maximum ring size", we need to consider about how much ring size is too large.
Basically, we consider that at least the macrocyclic compounds should be excluded as a part of glycans.
The "macrocycles" also has various definitions, but the many of those say that a ring of ten or twelve atoms is on the border line.
Thus, we determined that any substituents in a glycan must not have a ring of ten or more atoms in the SSSR.