cclib / cclib

Parsers and algorithms for computational chemistry logfiles
https://cclib.github.io/
BSD 3-Clause "New" or "Revised" License
336 stars 169 forks source link

cclib & jumbo #1114

Open ltalirz opened 2 years ago

ltalirz commented 2 years ago

I recently came across the parallel "Jumbo converters" in ioChem-BD for quantum chemistry code outputs to CML (Java codebase).

As mentioned here, there is a significant overlap in the codes supported (ADF, Gaussian, Molcas, MOPAC, Orca, Turbomole), so perhaps there might be an opportunity to share some of the work involved in keeping up with the atomistic software space.

At the same time, I realize of course that this is not always straightforward (programming language differences aside) and I notice some previous discussion in https://github.com/cclib/cclib/issues/163#issuecomment-70946410 .

Just intended as "food for thought" - feel free to close.

berquist commented 2 years ago

I remember reading about J-C when first implementing the CML writer. A couple of things:

ghutchis commented 2 years ago

I doubt the OB implementation (or the Avogadro2 implementation for that matter) validate with CML spec anymore. At one point Peter Murray-Rust worked with the OB code and made sure it validated, but then started changing the spec so frequently, it was impossible to keep up.

IIRC the Jumbo Converters haven't been touched in years - the official repo is now here: https://github.com/BlueObelisk/jumbo-converters

While CML exists "in the wild" I think it's better to push on a community standard - whether that's QCSchema or something similar outside of MolSSI is worth discussing.

From my perspective, cclib is the de facto standard for parsing comp chem files and other projects should help or leverage it.

berquist commented 2 years ago

(The remainder is my view and opinion, not necessarily that of the cclib project, so separate comment.) The field that cclib lives in suffers from a substantial amount of redundant work, at least in Python. It appears in the work of graduate students who write one-off ad-hoc solutions for parsing their files, and the data they need may or may not be parsed by cclib. In this case, it's a matter of awareness and discoverability: sure, we are Googleable, but we don't do much active outreach. This is the group we probably reach the most and benefits the most.

Collaborating across the language boundary will be tough. The best we could do at the moment is to produce a Chemical JSON or CML file that the Java process could grab from calling CPython (I have no idea if cclib will work under Jython). My vision is that much cclib functionality is subsumed by a package that compiles to a portable compiled library which exposes a C API, so that interfacing across languages becomes reasonable. In a perfect world we would no longer parse output files meant for human eyes as a form of data interchange, but it's not clear when that will stop.