HISKP-LQCD / sLapH-contractions

Stochastic LapH contraction program
GNU General Public License v3.0
3 stars 3 forks source link

Move to a standard configuration file format #78

Closed martin-ueding closed 5 years ago

martin-ueding commented 6 years ago

We currently use a configuration file format that somewhat resembles the INI format, but not quite. We express lists as specifying the same key multiple times, a real INI parser would complain. Also we have more intricate structure that currently is expressed in an ad-hoc defined syntax.

A quark is defined like this:

quark = u:5:TB:2:EI:2:DF:4:light

And a correlator like that:

correlator_list = C40C:Op0:Q0:Op4:Q0:Op7:Q0:Op9:Q0:P0,1,2,3,4

I would like to move away from this brittle construction to some data format which is flexible enough for our needs but also standardized such that we do not have to write a parser for it. YAML is my favorite choice. With YAML, this could look like this:

quarks:
- name: u
  num_rnd: 5
  dilution:
    dirac: {type: F}
    eigen: {type: I, count: 2}
    time: {type: B, size: 2}
  path: light

correlators:
- name: C40C
  operators: [0, 4, 7, 9]
  quarks: [0, 0, 0, 0]
  momentum: [0, 1, 2, 3, 4]

I already have a Python script that parses the old data format using a proper parsing library and proper definition of that ad-hoc syntax. It can then emit YAML like the above.

There is just one caveat: There is no YAML parser in Boost, we would have to add an external library for that. The yaml-cpp library is available in Fedora and Ubuntu Cosmic, but not in Ubuntu Xenial that we use on Travis CI. Additionally this won't be available on the systems that we run on. We already need to install LIME manually, so it does not get much worse, but it is yet another dependency.

Boost has the property tree module which has a JSON parser, but their internal representation is not exactly like JSON, so it is a bit cumbersome to use. Also the JSON syntax is not as nice to write by hand:

A quark:

{
  "name": "u",
  "dilution": {
    "dirac": { "type": "F" },
    "eigen": { "type": "I", "count": 2 },
    "time": { "type": "B", "size": 2 }
  },
  "num_rnd": 5,
  "path": "light"
}

And a correlator:

{
  "name": "C40C",
  "operators": [ 0, 4, 7, 9 ],
  "quarks": [ 0, 0, 0, 0 ],
  "momentum": [ 0, 1, 2, 3, 4 ]
}

So I see the following options:

  1. Add yaml-cpp as another dependency and use YAML as the input format.
  2. Use JSON and Boost's property tree even though JSON is not as pretty and Boost property tree is not a perfect fit for working with JSON.
  3. Just stay with the current setup and continue to add structure with ad-hoc kludges like the shifting syntax.

Neither option seems very appealing at the moment, perhaps I need to think about it for a while.

kostrzewa commented 6 years ago

yaml-cpp works well, I've used in for nyom and the various eigenvector test codes that I wrote and talked about. Installed locally using CMake it is also discoverable.

martin-ueding commented 6 years ago

Okay, these are very good news! Then I will incorporate this into the contraction code.

kostrzewa commented 6 years ago

See https://github.com/kostrzewa/nyom/blob/8faf5cff7246415eb49a860a9ffd35e6b5ea14e6/CMakeLists.txt#L107 which finds yaml-cpp built and installed locally

cmake \
    -DCMAKE_INSTALL_PREFIX=$HOME/local \
      $HOME/code/yaml-cpp
kostrzewa commented 6 years ago

For the quarks and specification of correlators, since you're modifying that anyway, can the quark lines be specified via names rather than numbers? That way, one doesn't have to enter the quarks in a particular order.

i.e.

correlators:
- name: C40C
  operators: [0, 4, 7, 9]
  quarks: [0, 0, 0, 0]
  momentum: [0, 1, 2, 3, 4]

becomes

correlators:
- name: C40C
  operators: [0, 4, 7, 9]
  quarks: [u, u, u, u]
  momentum: [0, 1, 2, 3, 4]
martin-ueding commented 6 years ago

Sure, the same applies to the operators. We could allow giving them names. But then the name attribute in the correlator should become type I'd say.

martin-ueding commented 5 years ago

We have removed much of the complicated entries from the configuration file. Therefore we mostly have a standard format, namely INI. And I do not see a point into changing this any more. The correlator list is JSON, that's fine.