Closed PKua007 closed 1 year ago
As an example, we will look at specifying two bulk observables.
bulkObservables = pairAveragedCorrelation 5 100 S110 primary layerwiseRadial 6.0.0 o , \
densityHistogram n_bins 0 100 100 tracker fourierTracker 0 2 1 primaryAxis x
Currently, observables are separated using ,
. First comes the name, then the arguments. The arguments are in different formats.
For pairAveragedCorrelation
, first is maximal distance 5
, then the number of bins 100
, then averaged function S110 primary
(which is $S_{110}$ correlation for the primary axes) and finally the specification of binning layerwideRadial 6.0.0 o
(where 6.0.0
are Miller indices describing layers and o
is geometric origin as the focal point).
densityHistogram
uses a bit different syntax. It is densityHistogram n_bins ... tracker ...
, where n_bins
, tracker
are named fields, which are followed by their arguments and can be in an arbitrary order. n_bins
arguments are x, y, z number of bins. tracker
is fourierTracker
, whose arguments are 0 2 1
(wavenumbers) and primaryAxis x
(x coordinate of the primary axis as a function).
It may be improvement by forcing to use the syntax from densityHistogram
for all multivalued fields:
bulkObservables = pairAveragedCorrelation max_r 5 n_bin 100 function S110 primary binning layerwiseRadial hkl 6.0.0 focal_point o , \
densityHistogram n_bins 0 100 100 tracker fourierTracker wavenumbers 0 2 1 function primaryAxis x
but the problem of ambiguous nesting persits.
The syntax can be improved by introducing nesting inspired by Python functions. Take as an example:
def func(a, b, c, d):
pass
# valid invocations:
func(0, 1, 2, 3)
func(a=0, b=1, c=2, d=3)
func(0, 1, d=3, c=2)
Using is as a guidance, it can be used to devise an improved syntax:
bulkObservables = [
pairAveragerCorrelation{max_r=5, bin_n=100, function=S110{axis=primary}, binning=layerwiseRadial{hkl=6.0.0, focal_point=o}},
densityHistogram{n_bins=0 100 100, tracker=fourierTracker{n=0 2 1, function=primaryAxis{coord=x}}}
]
All names of keys may be skipped if they are given in the correct order, or some of them may be left:
bulkObservables = [
pairAveragerCorrelation{5, 100, S110{primary}, layerwiseRadial{6.0.0, o}},
densityHistogram{0 100 100, fourierTracker{0 2 1, primaryAxis{x}}}
]
pairAveragedCorrelation
can be predefined for the parser, which will automatically report some of errorsYAML can be used
bulkObservables:
- pairAveragedCorrelation:
max_r: 5
bin_n: 100
function:
S110:
axis: primary
binning:
layerwiseRadial:
hkl: 6.0.0
focal_point: o
- densityHistogram:
n_bins: 0 100 100
tracker:
fourierTracker:
n: 0 2 1
axis:
primaryAxis:
coord: x
If changes are needed I'll go to some well known format (YAML, JSON, etc.). Backward compatibility can be kept as a separate parser can be chosen according to file extension (or there will be mechanism for conversion from present format to the new one)
Conversion mechanism is a good idea, I will integrate it if the breaking change are introduced.
I am a bit worried that formats like YAML will be quite verbose leading to worse readability anyway. Can you propose a more concise syntax for the given exemplary parameters using YAML or perhaps you know a format better suited to our needs?
And maybe this INI extension which looks like Python can be considered as well known? It can be made even more Python-like by replacing {...}
with (...)
and rewriting all space-separated fields like 0 100 100
as Python-like arrays [0, 100, 100]
.
To make it clear, it will then look like this
bulkObservables = [
pairAveragerCorrelation(max_r=5, bin_n=100, function=S110(axis=primary), binning=layerwiseRadial(hkl="6.0.0", focal_point="o")),
densityHistogram(n_bins=[0, 100, 100], tracker=fourierTracker(n=[0, 2, 1], function=primaryAxis(coord="x")))
]
Things like pairAveragedCorrelation(...)
look just like class constructors, which actually describes them well - it translates to creating PairAveragedCorrelation
which implements BulkObservable
. It can be even made 100% identical with future Python bindings.
//edit: Actually the above code is 100% valid Python code. Custom parser can be replaced in the future by invoking the Python interpreter.
In my opinion the python-like convention looks best especially if a parser is available. Otherwise I don’t think it is worth the effort
There are python parsing libraries for C++. There are also general libraries for parsing using BNF grammar. I am not sure which option is more convenient. Python parser can take any Python code - additional AST validation and a lot of conversion is needed. The other option requires devising BNF grammar and once again conversion of AST, but less extensive.
There is also a manual option: writing recursive descent parser for a simple grammar isn't that hard and there is a full control over error reporting, etc.
A consistent and convenient interface for the input file should be introduced. This also includes a generic parser in the source code.