egonw / semanticchemistry

Automatically exported from code.google.com/p/semanticchemistry
10 stars 1 forks source link

PubChem chemical properties and software descriptions #12

Closed GoogleCodeExporter closed 7 years ago

GoogleCodeExporter commented 9 years ago
Please incorporate the semantics of PubChem chemical properties and software 
libraries in CHEMINF.

Please find the attachment for description.

Original issue reported on code.google.com by GangFu1...@gmail.com on 9 Dec 2012 at 10:27

Attachments:

GoogleCodeExporter commented 9 years ago
The software libraries used by PubChem have been updated recently (2012-11-26).

Here is the table including new versions of the software libraries.

Please replace the old one.

Original comment by GangFu1...@gmail.com on 11 Dec 2012 at 3:19

Attachments:

GoogleCodeExporter commented 9 years ago
More requests from Gang:

1.  Can you add one object property called 
“has_PubChem_normalized_counterpart” which has similar semantics as 
“has_OPS_normalized_counterpart”, please find the first attachment about 
the PubChem normalization specification.
2.  Can you add one more object property called “has_component” which has 
domain as “chemical substance” and has range “molecular entity”. This 
relation can be used to connect the composition mixture and its components. For 
most composition entities, PubChem has corresponding component entity records.
3.  Regarding to the relation “has_uncharged_counterpart” that is similar to 
the relation “has_parent_compound” we have proposed, it should be applied 
to “chemical substance” such as mixtures. Since this relation can be 
designated to group the neutralized form and ionized form of a chemical 
structure, and the ionized forms are usually mixtures (with or without 
specification of counter-ions) and the neutralized forms always have single 
covalent unit, I suggest we distinguish the domain and range of this relation: 
the domain is “chemical entity”, and the range is “molecular entity”.
4.  Regarding to the 2D and 3D similarity relations, here is the specification:
The 2D similarity is determined by the 2D Tanimoto score which is calculated by 
equation: Tanimoto=AB/(A+B-AB), where AB is the count of bits set after 
bit-wise AND operation; A and B are count of bits set in fingerprint A and B. 
(see reference: http://pubchem.ncbi.nlm.nih.gov/help.html#tanimoto).
If the 2D Tanimoto score is greater than 0.9, the relevant molecular entities 
are considered similar.
The 3D similarity is determined by the 3D Tanimoto score, which is calculated 
based on shape volume overlap and pharmacophore feature overlap:
3D Shape Tanimoto Score: ST=VAB/(VAA+VBB-VAB), where VAA and VBB are 
self-overlap volume of conformers A and B, and VAB is the common overlap volume 
between them.
3D Feature Tanimoto Score: CT=sum(VfAB)/(sum(VfAA)+sum(VfBB)-sum(VfAB)), where 
the superscript f indicates any of the six independent fictitious feature atom 
types, VfAA and VfBB are the respective self-overlap volumes of conformers A 
and B for feature atom type f, and VfAB is the overlap volume of conformers A 
and B for feature type f. (see reference 
http://www.jcheminf.com/content/pdf/1758-2946-3-32.pdf)
If the 3D shape score is greater than 0.795 and the 3D feature score is greater 
than 0.495, the relevant molecular entities are considered similar; or if two 
entities lack of pharmacophore features, the cutoff is 3D shape score greater 
than 0.925.

For the 2D and 3D similarity relations, in addition to the relations we can 
define in CHEMINF to link two molecular entities, we can use the following RDF 
statements to specify the similarity scores (Tanimoto scores), please find the 
second attachment. The neighboring relation and score definition can be defined 
in SIO by Michel. 

Original comment by batchelorc@rsc.org on 5 Mar 2013 at 1:35

GoogleCodeExporter commented 9 years ago
Have taken ownership of this.

Original comment by batchelorc@rsc.org on 5 Mar 2013 at 5:09

GoogleCodeExporter commented 9 years ago
'has PubChem normalized counterpart' CHEMINF_000477
'has component' CHEMINF_000478
'has component with uncharged counterpart' CHEMINF_000480
'similar to by PubChem 2D similarity algorithm' CHEMINF_000482
'similar to by PubChem 3D similarity algorithm' CHEMINF_000483

and with that I think we're done.

Original comment by batchelorc@rsc.org on 5 Mar 2013 at 5:45