Request for machine-readable output

mkhorton commented 5 years ago

Hi @aoterodelaroza, I've written a parser for the critic2 standard output so that we can store the results of an analysis in our database. Unfortunately a recent change in output format broke the parser -- this is fine and not a problem, but I was wondering if you'd thought about providing a machine-readable output like JSON in addition to the standard out?

A machine readable output as JSON could look something like:

{
    unique_critical_points: [
        {
         'field': 93848.0413,
         'field_gradient': 0.0,
         'field_hessian': [[-2593274446000.0, -3.873587547e-19, -1.704530713e-08],
                           [-3.873587547e-19, -2593274446000.0, 1.386877485e-18],
                           [-1.704530713e-08, 1.386877485e-18, -2593274446000.0]],
         'frac_coords': [0.333333, 0.666667, 0.213295],
         'index': 0,
         'multiplicity': 1.0,
         'point_group': 'D3h',
         'type': "nucleus"
         },
         { ... }
     ],
     all_critical_points: [ ... ],
     critical_point_connections: [ ... ]
}

aoterodelaroza commented 5 years ago

Sure, that shouldn't be very hard to do. I'll try to get it done by the end of the week.

mkhorton commented 5 years ago

Fantastic, thank you :-) Happy to beta test.

aoterodelaroza commented 5 years ago

This is implemented now. You can do CPREPORT bleh.json to write a JSON file that contains the structure, the details for the reference field, and the CP list for that field, with all the information. Let me know how it goes (and sorry for the wait).

mkhorton commented 5 years ago

Excellent, will test ASAP and report back any issues.

bjoe2k4 commented 5 years ago

a few nits:

ellipticity is missing for bcps
nuclear cps should have neither repulsor nor attractor properties
adding graph to cpreport does not add bond and ring paths to the json file

just tested with a crystal structure calculated by elk:

{
  "id": 3,
  "name": "b1",
  "fractional_coordinates": [0.94984749593165,0.94984749593165,0.25000000000000],
  "cartesian_coordinates": [6.15300445807440,3.55243877991604,1.69328910124997],
  "multiplicity": 6,
  "point_group": "C2v",
  "rank": 3,
  "signature": -1,
  "field": 4.19146137796299E-02,
  "field_valence": 4.19146137796299E-02,
  "gradient": [-9.49778626950174E-18,7.12352803300226E-17,-1.22191192836356E-17],
  "hessian": [[-8.20597270293350E-04,5.86985401464478E-03,-7.74409668572255E-18],[5.86985401464478E-03,-7.59852086266068E-03,-1.42836808754198E-15],[-7.74409668572255E-18,-1.42836808754198E-15,-3.43919730780550E-03]],
  "gradient_norm": 7.28970505835850E-17,
  "gradient_norm_valence": 0.00000000000000E+00,
  "laplacian": -1.18583154407595E-02,
  "laplacian_valence": -1.18583154407595E-02,
  "hessian_eigenvalues": [-1.09874826577573E-02,-3.43919730780550E-03,2.56836452480327E-03],
  "number_of_pointprops": 0,
  "repulsor_angle": 1.80000000000000E+02,
  "repulsor_eigenvec": [0.86602540380759,0.49999999995990,-0.00000000000012],
  "repulsors": [{
    "cell_id": "1",
    "lvec": [-1043708348,32752,-1043708372],
    "distance": 1.62181159719652E+00,
    "path_length": 1.62181159719652E+00
  },{
    "cell_id": "2",
    "lvec": [32752,-1043708360,32752],
    "distance": 8.71528204816697E-01,
    "path_length": 8.71534118355477E-01
  }],
  "is_nucleus": false
}

How are cell_id and lvec to be interpreted? Also it would be helpful to add the cpid for repulsors and attractors. Otherwise happy to see this implemented :-)

mkhorton commented 5 years ago

Ellipticity is straight forward enough to calculate if you have the Hessian / Hessian eigenvalues (which are reported), so I'm personally not as concerned about that.

The graph info is important however (if not the path itself, at least the connectivity information, i.e. which ncps are connected to a given bcp).

Forgive the basic question, but what do the repulsor/attractor properties mean?

samblau commented 5 years ago

including YT info (if YT is run) in the JSON would also be greatly appreciated

aoterodelaroza commented 5 years ago

Sorry for the wait. Let's see:

Ellipticity: I agree with @mkhorton that if you have the Hessian eigenvalues, it doesn't make much sense, because it can be recalculated on the fly as lambda_1/lambda_2 - 1. Unless there is a specific reason to have it in there, I view it as redundant.
Only rings (signature +1) and bonds (signature -1) have repulsor and attractor keys. A bcp has two attractors, which are the nuclei you arrive at when you trace the gradient path following the Hessian eigenvector with the positive eigenvalue. Likewise, an rcp has two repulsors: the two cages you arrive at by following the eigenvector of the Hessian with negative eigenvalue. Perhaps the nomenclature could be improved - that's what they are called internally. It should certainly be documented somewhere.
Bond paths and ring paths when graph is used: those are not stored internally, so they were not available to the JSON writer. If you want them, I can put them in the JSON but I'll have to change a few things. Wouldn't be the end of the world, though, so let me know. Otherwise, I'm with @mkhorton that having the connectivity is more important.
A bcp has two attractors, that can be either nuclei or non-nuclear maxima. The cell_id of each of the attractors is the nucleus/ncp ID from the complete CP list, which, in the case of an atom, always coincides with the ID from the complete atom list. The lvec is the lattice vector that, combined with the cell_id gives the exact location of the attractor. Say, for instance, that the atom with ID 3 is at (0,0,1/2) and your bcp has the atom located at (1,0,1/2) as an attractor. Then, the cell_id of that attractor would be 3 and lvec would be (1,0,0). Clearly, there is something wrong with how the lvec is written because those numbers do not make sense. I'll take a look. Same for the rcps but with cages/repulsors instead of atoms/attractors.
Note that, from the preceding discussion, the connectivity of each bcp and rcp, and the molecular graph, can be extracted from the cell_id and lvec. critic makes an effort to present this in a nicer way, but it is all generated from the contents of the current JSON file.
YT results in the JSON file: that could be done, but it'd have to be somewhere else because in my mind CPREPORT should be used only for CP lists. How about adding a keyword to YT/BADER to have them write a JSON file? Or maybe a more general keyword that would dump both? I'd rather go with the former solution because the latter would imply keeping the integration results around, and they aren't used anywhere else in the program.

bjoe2k4 commented 5 years ago

This was the first test of the json cpreport on elemental beryllium from elk (repulsors present, signature wrong for nuclear cps):

{
      "id": 1,
      "name": "Be",
      "fractional_coordinates": [0.16666667000000,0.16666667000000,0.25000000000000],
      "cartesian_coordinates": [1.07964780442629,0.62333495046670,1.69328910125000],
      "multiplicity": 2,
      "point_group": "D3h",
      "rank": 3,
      "signature": -1,
      "field": 3.55518737889062E+01,
      "field_valence": 3.55518737889062E+01,
      "gradient": [0.00000000000000E+00,0.00000000000000E+00,0.00000000000000E+00],
      "hessian": [[-9.71529893156452E+11,-2.00367926788645E-04,-8.38056666004389E-09],[-7.99803206535014E-05,-9.71529893156453E+11,-1.85118959031637E-06],[-8.38056666004389E-09,-1.85118959031637E-06,1.93805965463654E+12]],
      "gradient_norm": 0.00000000000000E+00,
      "gradient_norm_valence": 0.00000000000000E+00,
      "laplacian": -5.00013167636694E+09,
      "laplacian_valence": -5.00013167636694E+09,
      "hessian_eigenvalues": [-9.71529893156453E+11,-9.71529893156452E+11,1.93805965463654E+12],
      "number_of_pointprops": 0,
      "repulsor_angle": 0.00000000000000E+00,
      "repulsor_eigenvec": [0.00000000000000,0.00000000000000,0.00000000000000],
      "repulsors": [{
        "cell_id": "-1025312208",
        "lvec": [-1025312252,32764,0],
        "distance": 0.00000000000000E+00,
        "path_length": 0.00000000000000E+00
      },{
        "cell_id": "32764",
        "lvec": [0,0,0],
        "distance": 0.00000000000000E+00,
        "path_length": 0.00000000000000E+00
      }],
      "is_nucleus": true
    }

The normal output seems ok:

# ncp   pg  type   CPname         position (cryst. coords.)

       mult  name            f             |grad|           lap
  1    D3h (3,-3) nucleus   0.16666667   0.16666667   0.25000000    2     Be      3.55518738E+01  0.00000000E+00 -5.00013168E+09

aoterodelaroza commented 5 years ago

Yeah, that's also a bug. I'll fix that too. Stay tuned.

mkhorton commented 5 years ago

Regarding the YT, I think adding a JSON output to the YT/BADER keyword sounds cleanest / would make the most sense, rather than mixing information between different keywords. Thank you for the clarification on repulsors/attractors! I wasn't familiar with that naming.

aoterodelaroza commented 5 years ago

No worries! Okay, I've pushed a commit to move the connectivity info from the non-equivalent CP list to the complete CP list in the JSON file. I had also mixed up attractors (for bcps) and repulsors (for rcps) so those are correct now. The combination of cell_id and lvec is basically the contents of the Complete CP list, bcp and rcp connectivity table in the output. The distance and path_length are the {r1,r2} and {p1,p2} in Analysis of system bonds and Analysis of system rings.

aoterodelaroza commented 5 years ago

I believe the problem with the rank and signature at the nuclei is fixed now.

aoterodelaroza commented 5 years ago

The JSON output for YT/BADER is done. You can now do:

YT JSON bleh.json

to write a JSON file containing the results of the integration. Works also for BADER.

mkhorton commented 5 years ago

Hi @aoterodelaroza, I had two questions for clarification:

The standard output seems to be in Ångström, while the JSON seems to be in Bohr. Is this always the case? It might be useful to include a top-level units keyword in the output to avoid any ambiguity.
When running on a molecule from a Gaussian Cube file vs say a crystal CHGCAR, my interpretation is that the analysis is essentially the same (minus the periodicity), and that the co-ordinates reported for the molecule are the co-ordinates relative to the Gaussian Cube origin/axes. I saw there were centering vectors reported also. Would these centering vectors just need to be subtracted from the nuclei co-ordinates to get back the original molecule atomic co-ordinates?

aoterodelaroza commented 5 years ago

The JSON file is basically a dump of critic's internal variables. Units are always atomic units, but I've added a "units" key in the JSON output as you suggested, just in case.

Regarding molecules: the analysis is the same but there are a few details to consider, (briefly) explained here. If you use MOLECULE to read the geometry, critic simulates the periodicity using a "molecular cell". Internally, however, it is still periodic. The output units are angstrom and referred to the molecular origin but bohr and the cube origin are used internally (and in the JSON file). The way to recover the coordinates of the original molecule is by doing molecule_centering_vector + cartesian_coordinates.

mkhorton commented 5 years ago

If units are always Bohr, I think that's ok, I wasn't certain whether critic2 always used the same units internally or whether it switched to e.g. Angstroms if the appropriate keyword was used. Thanks for clarifying!

aoterodelaroza commented 3 years ago

Closing this too, as it seems resolved. Please, let me know if there are any other JSON issues.

aoterodelaroza / critic2

Request for machine-readable output #28