egonw / semanticchemistry

Automatically exported from code.google.com/p/semanticchemistry
10 stars 1 forks source link

Investigate use of FOL to describe fused cycles of arbitrary length. #1

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
During the telecon of 04082009, it was suggested that FOL may be able to 
accurately represent fused cycles, each composed at least 3 different atoms, 
but potentially of arbitrary length.

This action item requires a formalization of the FOL description that can be 
subsequently tested using a FOL reasoner.

Original issue reported on code.google.com by michel.dumontier on 4 Aug 2009 at 4:35

GoogleCodeExporter commented 9 years ago
From first principles, one must infer
1 - the existence of rings from the connectivity of atoms.
2 - the existence of at least 3 different rings (they do not exactly share the 
same set 
of atoms)
3 - one of the rings is *directly* connected to the other two OR that each ring 
is 
*directly* connected to each of the other two.

Original comment by michel.dumontier on 4 Aug 2009 at 4:41

GoogleCodeExporter commented 9 years ago
Here is an attempt at a FOL-*like* formalisation, syntax needs a lot of work:

We specify that a molecule is composed of atoms and bonds, and then
1) forall(A) forall(B): edge(A,B) -> atom(A) and atom(B) and ( bond (A,B) or 
bond
(B,A) ).

2) forall(A) forall(B): path(A,B) -> atom(A) and atom(B) and path(A,C) and 
edge(C,B).

A cycle is a path of connected atoms:

3) forall(C): cycle(C) -> forall(A element of C): atom(A) and path(A,B) and 
path(B,C)
and edge(C,A). 

Now we need the edge count (for a particular atom) and the cycle length (for a
particular cycle). I'm not quite sure about the syntax for FOL and numeric 
values:

4) forall(A) exists(N): atom(A) and (count(B:edge(A,B))=N).

5) forall(C) exists(N): cycle(C) and (count(A elementof C)=N).

Now a naive formulation of a polycyclic cage structure is that it consists of 
all
atoms participating in cycles of the same length, and all atoms participating 
in the
same number of bonds, and each atom participating in more than 2 bonds. Here P 
is a
molecule consisting of atoms and bonds. 

6) forall(P): polycyclic_cage(P) -> exists(N):{forall(A) ( atom(A) and
(count(B:edge(A,B))=N) )} and exists(N2):forall(A)exists(C):{ (atom(A) and 
cycle(C)
and element(A,C) and (count(A elementof C)=N2) }.

Right. I've implemented this in Prolog in the attached file, and it finds 
butane and
tetrahedrane :-) and would, I assume, work for any other polycyclic cages that 
follow
the same pattern of regularity, including Fullerene.

It doesn't, however, work for diamantane, which is connected differently: 
although
every atom participates in a cycle of the same length, not every atom 
participates in
the same number of bonds. Perhaps there is a way to relax this particular 
requirement
while still capturing the "cage"-ness of the structure?

Original comment by janna.ha...@gmail.com on 11 Aug 2009 at 10:02

Attachments:

GoogleCodeExporter commented 9 years ago
It seemed that detecting the minimal set of cycles in a molecular graph was not 
a
task well suited to automated reasoning. However, algorithms to do just this are
already well developed and implemented in the CDK. Thus, I implemented a new 
version
which uses the CDK SSSR implementation to detect rings. A molecule is then 
specified
as a tuple (I,N,A,B,R) where I=Id, N=Name, A=AtomList, B=BondList, and R=RingSet
(generated using the CDK SSSRFinder). I use Java to parse the molecules beneath 
a
certain parent (polycyclic cage) in the ChEBI ontology and extract those 
children
with structures in this logical format. I also included in the knowledge base 
the
children of 'fused compound' (CHEBI:35293) which are all highly polycyclic and 
thus
provide likely candidates for false positives. Also added the children of 
'macrocycles'.

The generated knowledgebase is attached (knowledge_base.pl).

Chris's suggestion was that each ring had to share bonds with at least three 
other
rings, or maybe four. However, this doesn't work for tetrahedrane, for which 
the ring
set only consists of three rings, each of which have three atoms. I therefore 
propose
a new definition: each ring in the ring set which has size n shares at least 
(n-1)
bonds with other rings. OK! This seems to work now. (polycyclic_cage.pl).

Observations:
(1) checking that something IS a polycyclic cage is fast, checking that 
something is
NOT a polycyclic cage is sloooooooooow. Talk about combinatorial explosion. All
different bonds have to be tested against all different rings. Logic-based 
reasoners
are really not ideal for this task.
(2) we do not really use the atom list nor the bond list in our molecular
formalisation. However, these could be used for checking that the molecule 
consists
of *nothing but* the relevant cage that has been detected. Already, we check 
the ring
set that it forms a cage. However, we do not check for the following situations 
-
something is included within the cage, something is included outside the cage,
something is attached to the cage. In the former two situations we would 
actually
have a disconnected graph. In the latter situation, should the thing that is
connected outside the cage contain a ring, the present reasoner will fail.

Original comment by janna.ha...@gmail.com on 10 Sep 2009 at 2:53

Attachments:

GoogleCodeExporter commented 9 years ago
The results of this investigation formed the input into the Hastings, Kutz and 
Mossakowski 2011 DKR Workshop paper. 

Original comment by janna.ha...@gmail.com on 3 May 2012 at 7:57