rprospero commented 2 years ago

The input file is currently a bespoke file format that requires a custom parser. Using an existing file format (e.g. JSON, TOML) would eliminate a large amount of boilerplate code from our maintanence burden. Additionally, it would allow users to tap into an existing tooling eco-system for creating, inspecting, and modifying the input files.

Whatever format is chosen for the input file should be the same format as chosen for the restart file in #832

IVlaD17 commented 2 years ago

List of file formats to look at:

[x] XML
[x] JSON
[x] YAML
[x] TOML
[x] Dhall

After looking at all of those options, we have decided to go with TOML as the file format and toml11 as the library we'll be using.

rprospero commented 2 years ago

Just as a lark, we could look at Dhall. It does have some advantages compared to the other formats and it can be trivially converted down to any of the other formats. On the other hand, I can't imagine that it has good library support.

IVlaD17 commented 2 years ago

Ideally, we'd like to not have to add another dependency in the shape of a parser, but if we have to, we should aim to get something widely used, well-supported and robust.

rprospero commented 2 years ago

A good set of criterion to look at when evaluating the formats

Library Support

Is there a mature, cross-platform library for parsing and generating these files?
Does this library support a modern C++ coding style or is it largely raw pointers?

Hand Editing

Can a user easily create or edit a new file in a text editor?
Is it likely that a hand edited file will not contain syntax errors?
Can the user put comments in the file? Will those comments persist after Dissolve edits the file?

Tooling Support

Will common text editors provide support for editing this file format?
Are tools available to extract information from these files? e.g. using jq to pull a subset out of a JSON file.
Are tools available to transform these files? e.g. using XSLT to update XML files from and old schema to a new one.

Future Proofing

Can we easily update the schema without invalidating every old file?
Is it possible to easily support multiple schema versions?

Performance

Is reading and writing of the file format reasonably efficient?
Will this still apply for the (significantly larger) restart files?
Are the files small enough to easily send via e-mail?
Is it possible to put the restart files in a binary format for efficiency?
Is that binary format necessary for efficiency?

IVlaD17 commented 2 years ago

XML

Library Support

There's multiple libraries for XML but since we're already using pugixml there is not much point considering others because integrating those would take a greater amount of time which would negate the advantage of already using a library.

There is one other library that should be mentioned: RapidXML.

IVlaD17 commented 2 years ago

JSON

JSON doesn't look friendly to edit/read by hand so we'll be focusing on XML, YAML and TOML.

IVlaD17 commented 2 years ago

YAML

Library Support

There seem to be only 4 YAML libraries available for parsing and again these can be found at yaml.org. There's no clear-cut answer as to what's the best so some experimentation might be in order.

IVlaD17 commented 2 years ago

TOML

Library Support

Regarding TOML, I can only find one library that's heavily featured when investigating this and that library is toml++.

Hand Editing

I'm going to link this opinion piece on why this person thinks TOML is not a solid choice for what we're doing because he also discusses readability and maintainability from the perspective of a developer and a user. The guy's trying to promote his StrictYAML format instead so take everything here with a grain of salt.

IVlaD17 commented 2 years ago

Dhall

It appears that the links at How to integrate Dhall are broken. I personally consider this to be a bad sign in regards to implementing it so I will leave it without investigating it any more and I will instead focus on the other formats unless there's a very specific desire to look into this further.

rprospero commented 2 years ago

The following system test files can be run while only parsing the Species, Master, and PairPotential blocks:

tests/input/benzene.txt
tests/input/py5-ntf2.txt
tests/input/water.txt

disorderedmaterials / dissolve

Standard file format for input file #899

Library Support

Hand Editing

Tooling Support

Future Proofing

Performance

XML

Library Support

JSON

YAML

Library Support

TOML

Library Support

Hand Editing

Dhall