Closed sauliusg closed 6 years ago
The structure is not valid, I've relaxed the constraint recently. If you rebuild from source you will get the behavior
Which repo/commit are you relaxed constraints in? Pulling ff5ee4e from https://github.com/cdk/depict.git and the recent pulls+builds from https://github.com/johnmay/cdk.git or https://github.com/cdk/cdk.git do not change the behaviour.
The structure is not valid The 'n1cccc1' is admittedly not valid, although the web version infers aromaticity correctly for some reason :) The problems come with metal organics like '[Cu]12(Oc3c(C(=[N]2N=C(O1)c1ccc(O)cc1)C)cc(Br)cc3)[n]1ccccc1' or 'c1(cc(c2c3cccc(n3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F' (the main species in http://www.crystallography.net/cod/4106494.html), where aromaticity assumptions should take metal into account. Here, the behaviour of the web site http://www.simolecule.com/cdkdepict/depict.html would be handy; unfortunately, the 'git cloned' version behaves differently. Is there a possibility to obtain .jars and sources running on http://www.simolecule.com/cdkdepict/depict.html ? Regards, S.
The dashed lines indicates that the aromaticity state cannot be determined properly as the SMILES is invalid, shown here: http://www.simolecule.com/cdkdepict/depict/bow/svg?smi=c1(cc(c2c3cccc(n3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=nonehttp://www.simolecule.com/cdkdepict/depict/bow/svg?smi=c1(cc(c2c3cccc(n3%5bCu%5d3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none
See also discussion here before the rcdk depiction was updated: https://github.com/rajarshi/cdkr/issues/49
Here’s another depiction service, where CDK doesn’t depict (not sure what version they are using) https://apps.ideaconsult.net/ambit2/depict?search=c1(cc(c2c3cccc(n3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F+&smarts=#https://apps.ideaconsult.net/ambit2/depict?search=c1(cc(c2c3cccc(n3%5bCu%5d3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F+&smarts=#
If you use e.g. OpenBabel to convert your SMILES to SMILES, you get these, which depict fine: c1(cc(c2C3CCCC(N3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F [nH]1cccc1
http://www.simolecule.com/cdkdepict/depict/bow/svg?smi=c1(cc(c2C3CCCC(N3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=nonehttp://www.simolecule.com/cdkdepict/depict/bow/svg?smi=c1(cc(c2C3CCCC(N3%5bCu%5d3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none http://www.simolecule.com/cdkdepict/depict/bow/svg?smi=[nH]1cccc1&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=nonehttp://www.simolecule.com/cdkdepict/depict/bow/svg?smi=%5bnH%5d1cccc1&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none
The dashed lines indicates that the aromaticity state cannot be determined properly as the SMILES is invalid, shown here: ...
Thanks for the answer! Good to know, I though dashed lines are just a funny way to display aromatic bonds :)
Here’s another depiction service, where CDK doesn’t depict (not sure what version they are using) https://apps.ideaconsult.net/ambit2/depict?search=c1(cc(c2c3cccc(n3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F+&smarts=#https://apps.ideaconsult.net/ambit2/depict?search=c1(cc(c2c3cccc(n3%5bCu%5d3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F+&smarts=#
I have checked that service; their behaviour is consistent with the stock cdk 2.* behaviour (the SMILES that can not be kekulised throw exception in SmilesParser).
Which version of CDK are you using in http://www.simolecule.com/cdkdepict/depict.html?
If you use e.g. OpenBabel to convert your SMILES to SMILES, you get these, which depict fine: c1(cc(c2C3CCCC(N3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F [nH]1cccc1
This is where the problem is: obabel's SMILES are depicted but they are wrong – obabel "converts" pyridine ring moiety to piperidine (apparently because of perceived N-Cu bond), which the structure does not have. In the X-ray structure both N-containing rings are flat: http://www.crystallography.net/cod/4106494.html . Moreover, the SMILES you cite have H added to the five-memberd wring which is not there (should be Cu-n1cccc1, not [nH]cccc1). Open Babel 2.3.2 on my Ubuntu-16.04 does not do this change. Which version was yours?
The proper way to encode the cod/4106494 structure in SMILES is, IMHO, the following: '[Cu]12(P(c1ccccc1)c1ccccc1)n1c(cc(c1c1[n]2c(ccc1)c1ccccc1)C(F)(F)F)C(F)(F)F'. I'm looking for a way to parse this SMILES string in CDK and to depict it :)
There are often problems with aromatic Ns, can you try getting SMILES in “non-aromatic” notation? e.g. caffeine CN1C=NC2=C1C(=O)N(C(=O)N2C)C vs c1(=O)c2c(n(C)c(=O)n1C)ncn2C (both of these are depicted as they are both valid, I just want to demonstrate what I mean with the “non-aromatic” notation …). As to your other question, as far as I am aware the CDK Depict uses the latest and greatest version. At least I use it to test the latest functionality…
There are often problems with aromatic Ns, can you try getting SMILES in “non-aromatic” notation? e.g. caffeine CN1C=NC2=C1C(=O)N(C(=O)N2C)C vs c1(=O)c2c(n(C)c(=O)n1C)ncn2C (both of these are depicted as they are both valid, I just want to demonstrate what I mean with the “non-aromatic” notation …).
I see what you mean. Thanks for the tip! I'll try to "kekulize SMILES manually" and see what happens.
I have very little experience with organometallics, but there are two simple examples on the depict website, you may need to define the metal centre? ClPt@SP1([NH3])[NH3] cis-platin O=NCo@([NH3])([NH3])([NH3])N(=O) trans-[Co(NH3)4(NO)2] You may have to go to extended SMILES but John may be more help there than me.
As to your other question, as far as I am aware the CDK Depict uses the latest and greatest version. At least I use it to test the latest functionality…
I tried the leading edge :) CDK (2.1 bundle compiled from https://github.com/cdk/cdk.git master affc8d4 commit), but it also throws exception when parsing invalid SMILES, whereas your Web version does not :). Could you e-mail or post me the CDK jar bundle from the server (maybe privately), for a test? I'd like to see if I get the SMILES parsing as on your server, then I know where to look for a difference...
I've pushed the changes now, but be warned: here be dragons.
Taking a step back, what toolkit did you use to generate the SMILES? As per Noel's (@baoilleach) talk linked earlier some toolkits don't understand the rules and delocalise structures that should not be delocalised.
Valence model != reality - your structure should probably be:
[Cu--]12(Oc3c(C(=[N+]2N=C(O1)c1ccc(O)cc1)C)cc(Br)cc3)[n+]1ccccc1
C)cc(Br)cc3)[n%2B]1ccccc1&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none)
or if you want not have charges the pyridine can not be delocalised (see Noel's talk on, is it aromatic in real life = yes, can it be in SMILES = no).
[Cu]12(Oc3c(C(=[N]2N=C(O1)c1ccc(O)cc1)C)cc(Br)cc3)N1=CC=CC=C1
I've pushed the changes now, but be warned: here be dragons.
Great many thanks, you saved my day! Now I see the code and can reproduce the behaviour!
PS. Dragons are OK, we're working to domesticate them :)
I've pushed the changes now
BTW could it be that cdk.version 2.2-SNAPSHOT is not yet on the snapshot repo? My 'mvn compile' complains: "Failure to find org.openscience.cdk:cdk-depict:jar:2.2-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots", while it works when taking CDK 2.1-SNAPHOT as a fall-back.
Taking a step back, what toolkit did you use to generate the SMILES?
For the 'c1(cc(c2C3=CCC=C(N3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F' string from a COD CIF, I ran 'cif_molecule' from https://github.com/cod-developers/cod-tools, took the largest molecule, then used Open Babel 2.3.2 -- Dec 18 2015 -- 10:48:26 from the Ubuntu-16.04 apt repo to get SMILES (obabel -iCIF -oSMI filter). That SMILES is IMHO wrong ('...c2C3=CCC=C(N...' instead of '...=C2C3=CC=CC(=[N]...', so then I either a) edited the string manually b) loaded it to Avogadro, edited graphically the pyridine ring (deleting H, changing bonds to "aromatic"), saved CML and then used obabel (obabel -iCML -oSMI) to get SMILES, and removed charges from P. Both procedures yield identical SMILES 'c1(cc(c2c3cccc(n3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F' for the http://www.crystallography.net/cod/4106494.html main structure.
I've pushed 2.2-SNAPSHOT to OSSRH so it should pick it up from the snapshot repo now.
Sorry for butting in, but I'd recommend (and I think John would too) that you use the development version of Open Babel for better handling of aromaticity (thanks to @johnmay). Prior to a rewrite around August of last year, we were clearly getting things wrong.
On Ubuntu 16.04, you can "snap install openbabel --channel=edge" to get this. See https://baoilleach.blogspot.co.uk/2017/12/open-babel-in-snap-ii.html and https://baoilleach.blogspot.co.uk/2017/10/open-babel-in-snap.html for some background.
Any idea when those updates are going to make it into the stable release of open babel? How far behind is 2.4.1? Thanks!
I've pushed 2.2-SNAPSHOT to OSSRH so it should pick it up from the snapshot repo now.
Thank's a lot! It now works as expected (had to run 'rm -rf ~/.m2/repository/; cd depict; rm -rf .extract/; mvn clean; mvn package; java -jar target/cdkdepict-0.3.jar', though).
There are often problems with aromatic Ns, can you try getting SMILES in “non-aromatic” notation? e.g. caffeine CN1C=NC2=C1C(=O)N(C(=O)N2C)C vs c1(=O)c2c(n(C)c(=O)n1C)ncn2C (both of these are depicted as they are both valid, I just want to demonstrate what I mean with the “non-aromatic” notation …).
I see what you mean. Thanks for the tip! I'll try to "kekulize SMILES manually" and see what happens.
Just in case you are interested: I have manually "kekulised" the aromatic http://www.crystallography.net/cod/4106494.html SMILES, 'c1(cc(c2c3cccc(n3[Cu]3(n12)P(c1ccccc1Oc1c(P3(c2ccccc2)c2ccccc2)cccc1)(c1ccccc1)c1ccccc1)c1ccccc1)C(F)(F)F)C(F)(F)F', as 'C1=(CC(=C2C3=CC=CC(=[N]3[Cu]3([N]12)PC=CC=C1)(C1=CC=CC=C1)C1=CC=CC=C1)C1=CC=CC=C1)C(F)(F)F)C(F)(F)F'. Now both CDK-depict and obabel display them as expected. But... on the https://apps.ideaconsult.net/ambit2/depict both PubChem and Chemical Identifier Resolver fail (empty windows), although they work on the original string (and PubChem I would say produces a reasonable depiction, the one I would expect looking at the crystal structure). Truly metal-organics are tricky :)
Correction:
'C1(=CC(=C2C3=CC=CC(=[N]3[Cu]3([N]12)PC=CC=C1)(C1=CC=CC=C1)C1=CC=CC=C1)C1=CC=CC=C1)C(F)(F)F)C(F)(F)F'
The problem was to write 'C1=(CC(=C2C3...' instead of 'C1(=CC(=C2C3...'.
Sorry for butting in, but I'd recommend (and I think John would too) that you use the development version of Open Babel for better handling of aromaticity (thanks to @johnmay). Prior to a rewrite around August of last year, we were clearly getting things wrong.
On Ubuntu 16.04, you can "snap install openbabel --channel=edge" to get this.
Thanks for the hint, @baoilleach! I'm not used to snap and do not trust it enough to run under 'sudo', but the I have a git clone compiled, for tests. It works fine, actually; for the SMILES discussed here it puts radicals either on phosphorus or on pyrole residue carbon C1; both seem "too radical", to my taste. The stock version (obabel 2.3.2) does not do that. Sorry if this is off-topic here...
For production, it is nice to have a referencable version, so we usually use the latest release in APT repos.
Hi, I'm trying to reproduce the http://www.simolecule.com/cdkdepict/depict.html server action locally on my host, but the local version, both from pre-compiled jars and compiled from sources, fails to parse non-kekulisable SMILES, such as 'n1cccc1' or '[Cu]12(Oc3c(C(=[N]2N=C(O1)c1ccc(O)cc1)C)cc(Br)cc3)[n]1ccccc1' ;). My locally installed version fails "... with root cause org.openscience.cdk.exception.InvalidSmilesException: could not parse 'n1cccc1', a valid kekulé structure could not be assigned". Compiling CDK and/or Depict from sources works exactly the same (i.e. raises exception), as does a command-line wrapper around CDK 2.1 or 2.2. While this behaviour apparently stems from the underlying CDK SmilesParser, and seems to be a feature, not a bug, the on-line 'cdkdepict' version mentioned above does parse these SMILES and depicts them nicely ;) (see the attached screen-shot):
Does this mean that the server uses different/newer CDK libraries? If so, would it be possible to have them pushed into the GitHub repo (maybe as an experimental branch)?
Exception messages would also be helpful to debug the case, if presented next to the broken image icon.
The platform for running the 'cdkdepict' 0.3 was:
saulius@koala depict-0.3/ $ java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
saulius@koala depict-0.3/ $ uname -a
Linux koala 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
saulius@koala depict-0.3/ $ osname
Ubuntu-16.04
Sincerely, Saulius (join("@", ( "grazulis", join(".", ("ibt","lt")))) to mail me directly ;)