Closed Brakjen closed 2 years ago
TL;DR this is so by design. The "contract" between parselglossy
and its users is that parselglossy
won't touch what's between $<name>
/$end
.
This is where and how the parsing token is defined for those kinds of parameters: https://github.com/dev-cafe/parselglossy/blob/master/parselglossy/grammars/atoms.py#L89-L95
I cannot find the issue where we discussed this (it might be in a thread on some Zulip channel) but the $<name>
/$end
parameters are by design escape hatches to pass untyped information verbatim past the input parser and into the final dictionary. The idea was to keep the grammar simple and avoid type-checking for things that the developers using parselglossy
know how to read better than we could. Preserving indentation might be one of the use cases for this: it is a weird requirement in the context of parsing molecular geometries, but it might be essential somewhere else.
Description
Upon parsing coordinate sections such as those used when specifying atomic coordinates
or solvation cavity spheres
the parser ignores all whitespace and newlines between
$start
and the actual content, but does not ignore whitespace and newlines between the content and$end
. As a result, the following sections are not parsed identically:These result in the following strings, respectively
The expected output for all is (at least to me) the last one. This could become a bit problematic when the user indents these sections (very common to do), and some type of sanity checking is performed on the data. Consider the following
results in for the three examples
The middle example has resulted in an empty list element.
strip()
ing beforesplit()
ing fixes the issue, butparselglossy
should probably strip all extra whitespace under the hood.