Rewrite DefaultConfigFileParser.parse()

macfreek commented 3 years ago

This allows for empty values: key= or key="". This is different from specifying key without a value (with still sets key to "true"). Closes #103 Closes #136

This allows for negative values, or values that start with a dash: key=-10, key=-A38E-8 Closes #137 (though that most likely was already resolved by #160)

This allows to specify special characters, by putting them in quotes (either ' or " quotes): key=":", key="=", key="#", key="'", key='"', etc.

Also adds test suites to explicitly test for the different values and syntaxes. Beware that by adding these test suites some explicity choices are made in particular to the allowed syntax of keys. Previously keys may contain more or less any character (at least according to the parser). ~~Now, allowed characters are: 0-9A-Za-z_-. Thus, e.g. key$name or key^name is no longer allowed.~~ Edit: This remains so in this version.

macfreek commented 3 years ago

I'm happy with this improvement, but beware of some changes in behaviour. For example:

Line in config file	current behaviour	new behaviour
key =	key: "="	key: ""
key = '#'	key: "'#'"	key: "#"

I think the new behaviour is what user likely intended, but you can argues it is backward incompatible for users who used key = to really intended that key equals the string "=".

Also, key names are slightly more restricted, just like they are on the command line. It is no longer possible to use other characters than a-zA-Z0-9_- in a key.

Thus, key_with_underscore is still valid, and so is _key and key_, as well as key-name. But key%name, key^name, key$name or key@name are no longer valid. I don't think it was possible to specify these kind of keys with add_arg, so I suspect this is not an issue. However, let me know if you like me to loosen this restriction.

My intention is that it should behave like on the command line, where I don't think it is possible to specify those, and I suspect that brining this in line with each other may safe some headaches in the future.

bw2 commented 3 years ago

Thanks very much for fixing multiple issues and adding these tests!

Re. restricting which characters are allowed in keys, you're right there's the backward compatibility issue. In addition, I gravitate towards "allow unless there's a good reason to restrict". I feel that it's impossible to know why/when a user might have a reason to use unusual key names, and it would be unfortunate to outlaw this unless it's strictly necessary. For example, I can imagine @ or . being useful in some keys, etc. If there are specific chars that break parsing, then I'd prefer to explicitly add them to [^:=;#\s]+, rather than outlawing a whole bunch of chars via [\w\-]+.

macfreek commented 3 years ago

I'll make a commit that loosens the keyword restriction.

I'm currently wondering why I made this change in 2018 (in the comment of #103), and to what extend this can and should be tested.

bw2 commented 3 years ago

ok sounds good

macfreek commented 3 years ago

I added a commit to this PR to revert the keyword syntax. Validity checks for key@word, key$word, and key.word are added to the unit test.

My personal preference was not to allow this syntax, and "reserve" special characters for future new behaviour. I guess that's why my original regexp behaved this way. However, as you said in #185, this can be resolved by adding an option to the ArgParser constructor. More important is that this PR is more atomic by retaining the current behaviour. So I gladly made the change. Thanks for taking the time to converse about the desired behaviour.

bw2 commented 3 years ago

This is great. Thanks again for putting this together.

bw2 commented 3 years ago

I'll probably push a new release this weekend.

bw2 / ConfigArgParse

Rewrite DefaultConfigFileParser.parse() #215