boostorg / program_options

Boost.org program_options module
http://boost.org/libs/program_options
110 stars 107 forks source link

Possible implementation of limited comments #101

Open xloem opened 4 years ago

xloem commented 4 years ago

Provides for # characters in configuration file values. See #100

codecov[bot] commented 4 years ago

Codecov Report

Merging #101 into develop will increase coverage by 0.00%. The diff coverage is 25.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #101   +/-   ##
========================================
  Coverage    49.89%   49.89%           
========================================
  Files           23       23           
  Lines         1385     1387    +2     
  Branches       707      709    +2     
========================================
+ Hits           691      692    +1     
  Misses         111      111           
- Partials       583      584    +1     
Impacted Files Coverage Δ
...clude/boost/program_options/detail/config_file.hpp 38.88% <0.00%> (ø)
include/boost/program_options/parsers.hpp 40.00% <ø> (ø)
src/parsers.cpp 45.61% <0.00%> (ø)
src/config_file.cpp 38.02% <33.33%> (+0.34%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0414abe...e47c6bd. Read the comment docs.

xloem commented 4 years ago

I got an email mentioning that it might be possible to use std::quoted to resolve this, too. I see the comment is deleted so I infer that they discerned it might not be the right choice, but I'm mentioning it here because I was unaware of std::quoted.

If somebody else wants to try adding some more options to this for maintainers to choose from in case they have concerns, using escape characters is another option.

There might have also been a concern around the interface for enabling or disabling it.

FloopCZ commented 4 years ago

Hi, yes, sorry about that, that was my fault. As I have written previously, I believe the proposed technique calls for error-prone config files as I am not aware of any interpreter that behaves like this. If you also consider escape character instead (e.g. similarly to \ in Bash), std::quoted will probably not be suited for that case (as I wrongly believed before). However, you can use a regular expression to parse the line, e.g., ([^=]+)=((?:\\.|[^\\#])*)#{0,1}(.*) with whitespace trimming should work by a quick test:

#!/usr/bin/env python3
import re
cases = [
    (r"key=abcd",                    (r"key", r"abcd", r"")),
    (r" key = abcd # com",           (r" key ", r" abcd ", r" com")),
    (r" key =abcd#comment",          (r" key ", r"abcd", r"comment")),
    (r"key =abcd #comment",          (r"key ", r"abcd ", r"comment")),
    (r" key=abcd\#nocomment",        (r" key", r"abcd\#nocomment", r"")),
    (r" key = abcd\#nocomment#com",  (r" key ", r" abcd\#nocomment", r"com")),
    (r" key = abcd\#nocomment###",   (r" key ", r" abcd\#nocomment", r"##")),
    (r" key = abcd\\#comment",       (r" key ", r" abcd\\", r"comment")),
    (r" key = abcd\\\#nocomment",    (r" key ", r" abcd\\\#nocomment", r"")),
    (r" key = abcd\nocomment",       (r" key ", r" abcd\nocomment", r"")),
]
rex=r"([^=]+)=((?:\\.|[^\\#])*)#{0,1}(.*)"
for s, v in cases:
    assert re.fullmatch(rex, s).groups() == v
print("OK")

That could also be the default and only behavior so you would not need to pass an extra argument. Explanation of the regex:


([^=]+)=            key and `=` character
(?:                 value, i.e., either
    \\.                any escaped character (including the escape character itself)
    [^\\#]             or a normal non-escaped non-comment character
)
#{0,1}(.*)          comment
xloem commented 4 years ago

Thanks so much for your feedback.

I'm worried adding it as the default behavior could break existing programs when distributions and developers upgrade. There are a lot of users of programs linked to boost out there, a huge config file space.

Do you think there'd be a solution that preserves the behavior for people who happen to have the edge case of 'a=b\#comment in a pre-existing config file?

Developers wouldn't usually be able to predict if their users were affected by this change, to know whether to require an earlier version or not when upgrading, and it's a small change that would be hard to get everyone to spread preparation for.

FloopCZ commented 4 years ago

You have a good point. However, in my opinion, the edge case a=b\#comment (i.e., "escaping" a comment sign while still treating the comment as a comment) is against widely known practices and therefore it will be so very rare that this breaking change will probably not affect anyone.

If it bothers you anyway, the next version of program_arguments could also print a warning that the behaviour regarding this edge case has changed (or will change in the future) if encountered, as some other programs do (e.g., git, gcc, etc.), but I would not bother.

I guess @vprus will have to share his opinion with us. :slightly_smiling_face:

xloem commented 4 years ago

I like the warning solution, although I'm thinking of GUI applications where the config files could even be autogenerated.

I won't likely be working further on this PR for quite some time, so it's mostly just for when the work hours are available from someone.

It seems to me a separate options class, or a separate member function to set the parsing options, might be a more robust solution.