COMCIFS / TopoCif

Development of the Topology CIF Dictionary
6 stars 6 forks source link

bond types #19

Closed BobHanson closed 6 years ago

BobHanson commented 6 years ago

There are actually more bond types. For example, https://www.sciencedirect.com/science/article/pii/S0009261410004008 describes a possibility of a hex-bond. Jmol can represent this; I would like to see 5- and 6-bond options here.

In addition, there are all sorts of partial bonding possibilities. In Jmol we can represent aromatic, partial, or partialDouble among many other possibilities.

Blatov commented 6 years ago

We have rested upon the types already presented in the CIF format. But of course any other reasonable types are welcome. So you could just propose a list of additional types and their symbols.

jamesrhester commented 6 years ago

@BobHanson , can you give a simple list of additional bond types and definitions as a reply to this issue? Don't worry about taking the time to be exhaustive, just include what you would like to see (hex,penta, aromatic,partial, partialdouble - is that the lot?)

BobHanson commented 6 years ago

Here is Jmol's repertoire. Many, I am sure, are irrelevant. I can't remember the difference between Partial23 and Partial32.

private enum EnumBondOrder {

SINGLE(BOND_COVALENT_SINGLE,"1","single"),
DOUBLE(BOND_COVALENT_DOUBLE,"2","double"),
TRIPLE(BOND_COVALENT_TRIPLE,"3","triple"),
QUADRUPLE(BOND_COVALENT_QUADRUPLE,"4","quadruple"),
QUINTUPLE(BOND_COVALENT_QUINTUPLE,"5","quintuple"),
sextuple(BOND_COVALENT_sextuple,"6","sextuple"),
AROMATIC(BOND_AROMATIC,"1.5","aromatic"),
STRUT(BOND_STRUT,"1","struts"),
H_REGULAR(BOND_H_REGULAR,"1","hbond"),
PARTIAL01(BOND_PARTIAL01,"0.5","partial"),
PARTIAL12(BOND_PARTIAL12,"1.5","partialDouble"),
PARTIAL23(BOND_PARTIAL23,"2.5","partialTriple"),
PARTIAL32(BOND_PARTIAL32,"2.5","partialTriple2"),
AROMATIC_SINGLE(BOND_AROMATIC_SINGLE,"1","aromaticSingle"),
AROMATIC_DOUBLE(BOND_AROMATIC_DOUBLE,"2","aromaticDouble"),
ATROPISOMER(TYPE_ATROPISOMER, "1", "atropisomer"),
UNSPECIFIED(BOND_ORDER_UNSPECIFIED,"1","unspecified");

On Wed, Mar 21, 2018 at 11:58 PM, jamesrhester notifications@github.com wrote:

@BobHanson https://github.com/bobhanson , can you give a simple list of additional bond types and definitions as a reply to this issue? Don't worry about taking the time to be exhaustive, just include what you would like to see (hex,penta, aromatic,partial, partialdouble - is that the lot?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMCIFS/TopoCif/issues/19#issuecomment-375180164, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7RW8dn1IdI8y3RjMbRkUdKJ1R3x_E0ks5tgy-IgaJpZM4Slka5 .

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

jamesrhester commented 6 years ago

Great - can you give a one sentence description of the ones that aren't obvious (I have no idea about 'strut' or any of the partials or atropisomer). Also, I forgot to say that it is desirable that these bond types shouldn't overlap so that we can make it easy on the data miners. For example, when does a partial bond become a full bond? Is there a clear demarcation? If not, we may want to simply say 'no bond' instead of partial and write in the definition that 'no bond' includes partial bonds.

Blatov commented 6 years ago

If we want to adjust the format for any bond type we could admit numbers (1, 2, ...) in _topol_bond.type item and introduce two additional items:

_topol_bond.type_id, which must be equal to the number specified in _topol_bond.type and _topol_bond.type_description with any description of this type of bond

BobHanson commented 6 years ago

You can ignore the non-obvious ones. I was just showing the (roughly) full range of what Jmol can do. Actually, Jmol can draw any partial bond (solid and dashed both) using a binary number description -- for example, 5 (binary 101) means "solid,dashed,solid" -- but I would not foist that on anyone. I would say the most useful would be (with possible rendering interpretations):

single-hex (up to six just for completeness), partial-0.5 (a dashed line, probably, when rendered), partial-1.5 (a solid and a dashed line), partial-2.5 (two lines and a dash), aromatic (probably like partial-1.5, but possibly a circle in an n-gon), aromatic-single (single, but tagged as aromatic) aromatic-double (double, but tagged as aromatic)

The problem with all of these is that they are subjective interpretations. But, I guess, so is all topology (?)

Bob

jamesrhester commented 6 years ago

We can either go in the direction suggested by @Blatov , and introduce a separate table of bond types that can be expanded as needed, and referred to in _topol_bond:

_topol_bond_type.id
_topol_bond_type.description
sh      'A single hex bond'
p25    'A 2.5 partial bond' 

or we can just add some extras to the current list. Are there any preferences? Part of this depends on the purpose of this data name - e.g. from @BobHanson 's point of view, it provides information on how links could be displayed but is not used further (I assume). Is there another use for this information that might be stricter (e.g. characterisation of lattice edges as strong or weak that can't otherwise be carried out based on distance etc.)?

merkys commented 6 years ago

Predefined set of the most prominent bond types plus the means to extend it would be the best solution, IMO. I think that machine-parsing the natural language definitions from _topol_bond_type.description is the least one would want to do to. However, for really exotic cases there is not much else to do. In any case it would be great if novel bond types could be accepted to the enumerator in the dictionary.

Blatov commented 6 years ago

So I think we should accept the _topol_bond_type.id and _topol_bond_type.description items.

dmproserpio commented 6 years ago

I agree with Blatov, we should accept the _topol_bond_type.id and _topol_bond_type.description items.

BobHanson commented 6 years ago

excellent. Thank you.

On Fri, Apr 6, 2018 at 3:43 AM, Davide M Proserpio <notifications@github.com

wrote:

I agree with Blatov, we should accept the _topol_bond_type.id and _topol_bond_type.description items.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMCIFS/TopoCif/issues/19#issuecomment-379188652, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7RW-frB8D5UZcNoTpq9Wbj5guisF-_ks5tlyrEgaJpZM4Slka5 .

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

BobHanson commented 6 years ago

Only question I have is this:

Does this mean that, say, I could define "p25" to mean one thing in one CIF, and you could define it to be another in a different CIF? Is that a problem?

Bob

On Fri, Apr 6, 2018 at 8:05 AM, Robert Hanson hansonr@stolaf.edu wrote:

excellent. Thank you.

On Fri, Apr 6, 2018 at 3:43 AM, Davide M Proserpio < notifications@github.com> wrote:

I agree with Blatov, we should accept the _topol_bond_type.id and _topol_bond_type.description items.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMCIFS/TopoCif/issues/19#issuecomment-379188652, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7RW-frB8D5UZcNoTpq9Wbj5guisF-_ks5tlyrEgaJpZM4Slka5 .

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

BobHanson commented 6 years ago

sorry -- hit send too quickly.

Also,

"single hex" is an oxymoron. A bond can't be both a single bond and a hex bond. Wouldn't "sh" just be "6"?

Bob

On Fri, Apr 6, 2018 at 8:07 AM, Robert Hanson hansonr@stolaf.edu wrote:

Only question I have is this:

Does this mean that, say, I could define "p25" to mean one thing in one CIF, and you could define it to be another in a different CIF? Is that a problem?

Bob

On Fri, Apr 6, 2018 at 8:05 AM, Robert Hanson hansonr@stolaf.edu wrote:

excellent. Thank you.

On Fri, Apr 6, 2018 at 3:43 AM, Davide M Proserpio < notifications@github.com> wrote:

I agree with Blatov, we should accept the _topol_bond_type.id and _topol_bond_type.description items.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMCIFS/TopoCif/issues/19#issuecomment-379188652, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7RW-frB8D5UZcNoTpq9Wbj5guisF-_ks5tlyrEgaJpZM4Slka5 .

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

Blatov commented 6 years ago

If the bond is user-defined, the name and definition is up to the user. So, yes, the same _topol_bond_type.id could be assigned to different kinds of bonds in different cifs. But I do not see here any real problem - for example the same atom type can also mean different for different authors.

And indeed we should correct A single hex bond to just A hex bond. We can use 6 or hx, for example, for _topol_bond_type.id in this case.

jamesrhester commented 6 years ago

I didn't think through the full implications of @Blatov 's proposal, but fortunately @BobHanson is alert. It is essentially unworkable to have a custom table for bond types in every CIF. The reason it is unworkable is that the CIF dictionary is supposed to be defining data names that allow automated processing of files. Any value that depends on free-form text is useless for machine processing, so, for example, @BobHanson couldn't display any bonds by bond type in his software because his software would have to understand arbitrary text strings first.

A more philosophical objection is that the standard represents an agreement on meaning between two parties that are not otherwise in contact. If a data file can contain arbitrary bond classifications, then there is essentially no agreement on bond types and the information is not suitable for the standard. That is why any list of bond types should be in the TopoCif dictionary. We can add to this list in the future (but never subtract).

So I am in favour of the original scheme, with the subset of bonds that we do agree on and understand explained in the dictionary (not the data file).

Blatov commented 6 years ago

Ok, I agree that we should not give too much freedom for the user here. But at the same time it would be good if the list would have some flexibility. We could keep _topol_bond_type.id and _topol_bond_type.description items but allow to use only predefined bond types for _topol_bond_type.id. This could be important if the user wants to provide some additional information on the bonding. And for flexibility we could predefine one more type ud means user defined; this type would designate some special bond, and the program could output the its description if required.

merkys commented 6 years ago

I strongly agree with @jamesrhester. We need to have machine-readable topology descriptions.

BobHanson commented 6 years ago

My suggestion for a starting point is:

bonds single-hex partial bonds 0.5 1.5 2.5 3.5 4.5 5.5 (just for completeness) hydrogen bonds

That would certainly cover anything of general interest in Jmol.

These could be coded any way you want. Anything wrong with actually using numbers there for all those 1-6 and 0.5, 1.5, ...? Or does everything have to be a code like s,d,t,..?

Then the only special one would be hydrogen bonds, which might be special anyway, because they can have energies attached -- though perhaps that is outside the scope of this CIF format.

Bob

On Mon, Apr 9, 2018 at 12:47 PM, Andrius Merkys notifications@github.com wrote:

I strongly agree with @jamesrhester https://github.com/jamesrhester. We need to have machine-readable topology descriptions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMCIFS/TopoCif/issues/19#issuecomment-379835941, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7RWz0ZKFi6sJPqTXTX74saCY09q751ks5tm56UgaJpZM4Slka5 .

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

Blatov commented 6 years ago

I think instead of introducing many types for different bond order, it would be better to introduce just one more item_topol_link.order, which would be a real number [0..infinity]. And we could keep only special types for single, double, triple and quadruple bonds, so keep the list of types as it is now in the current version of the dictionary. We can forbid user types for bonds, but anyway, I think we need a field _topol_link.description for an arbitrary description of the bond features. There is a similar item in the ATOM category: _atom_site_description. I also noticed that there is one mentioning of _topol_bond instead of _topol_link in the dictionary, when we explain the TOPOL_REPRES subcategory. This should be fixed.

jamesrhester commented 6 years ago

I don't understand how _topol_link.order would interact with _topol_link.type. If I have a double bond, should I include both _topol_link.type of db and _topol_link.order of 2? If I have an aromatic bond, or a Van der Waals link, what would _topol_link.order of 2 mean? I'm not necessarily against the idea, but perhaps @Blatov could write a definition for _topol_link.order explaining the precise usage. Is the intention that this is purely for display purposes or is there chemical or topological significance?

Meanwhile, I agree that we should add _topol_link.special_details for any user-specific information.

Blatov commented 6 years ago

_topol_link.order should be understood in a common chemical meaning as the bond order (number of electron pairs per bond). But it should not be a mandatory field. It could extend the bond description and supply the _topol_link.type information. If we want to describe the bond order we should use the general type of valence bond (v). These two constructions _topol_link.type v _topol_link.order 2 and _topol_link.type db are equivalent. The construction _topol_link.type sg _topol_link.order 2 is strictly speaking conflicting, but the program can ignore _topol_link.order field if the order is predetermined by _topol_link.type (sg db tr qd). For van der Waals interaction the following data could be feasible: _topol_link.type vw _topol_link.order 0.01

jamesrhester commented 6 years ago

The scheme suggested by @Blatov is a bit messy in that it allows contradictory information to be easily presented. I suggest that we stick with bond types that are qualitatively different ( v, vw, pi, hb and ar ?) and then _topol_link.order can be used to distinguish the different number of electron pairs involved. If a bond has no concept of strength, topol_link.order can be ignored. How does that sound?

Blatov commented 6 years ago

This is exactly what I meant; sorry if I was not clear enough. _topol_link.order should be optional and used only if we want to specify the bond strength.

jamesrhester commented 6 years ago

See update e14c81b and let me know if this is satisfactory. Feel free to make your own edits to the dictionary.

Blatov commented 6 years ago

I think we should leave specific bond in the list of bond types. This type could be used for any bonding intermediate between valence and van der Waals, like halogen, chalcogen or other recently proposed types of weak bonding.

jamesrhester commented 6 years ago

How should specific bond be used? Is it just the value sb for _topol_link.type and the details given in special_details? Or is there somewhere else that halogen, chalcogen etc. should be specified?

Blatov commented 6 years ago

Yes this is just an additional value for _topol_link.type. We used s in the previous dictionary edition, but sb is probably better. The point is that we need a special type for contacts, which are neither valence nor van der Waals. There can be a lot of subtypes for them, like for valence contacts, but the details can be described in special_details. Actually we already use one subtype for valence bonds (ar), and one subtype for specific bonds (hb), but we need also values for general types (valence and specific). Halogen, chalcogen etc. are subtypes of specific bonds, but no need to use special values for them at the moment.

jamesrhester commented 6 years ago

I've added sb and _topol_link.special_details to the list of bond types in 6ad75db

jamesrhester commented 6 years ago

Closing issue as there seem to be no further suggestions or objections.