ModelInference / synoptic

Inferring models of systems from observations of their behavior
Other
81 stars 25 forks source link

Regex capture groups silently require whitespace #397

Open ohmann opened 8 years ago

ohmann commented 8 years ago

Capture groups within the regex(es) used to parse log lines silently require whitespace to precede them. This is undocumented and is only detectable when using the --debugParse flag. In other words, it is likely to confuse users.

Example command: synoptic.sh -r "(?<ITIME>),(?<ip>),(?<TYPE>)" [... other args ...]

Example debug parse snippet:

INFO: input: (?<ITIME>),(?<ip>),(?<TYPE>) 
INFO: processed: (?:\s*(?<ITIME>\S+)\s*),(?:\s+(?<ip>\S+)\s*),(?:\s+(?<TYPE>\S+)\s*) 
INFO: standard: (?:\s*(\S+)\s*),(?:\s+(\S+)\s*),(?:\s+(\S+)\s*)

Note that the capture group (?<ip>) is transformed into (?:\s+(?<ip>\S+)\s*), i.e., the capture group won't match a log line unless it is preceded by 1+ whitespace characters. This is somewhat deceptive, since the regex the user passed does not reference whitespace at all. The user might think that a line like 123,4.4.4.4,event would be matched, but in fact it will not be.

This only affects default behavior. Manually specifying capture group formatting, e.g., (?<ip>\S+), works as expected and does not silently require whitespace before it.

bestchai commented 8 years ago

This is an important usability bug to fix. However, note that you'll have to fix existing args.txt files in the repository, as well. I think this is worthwhile, but the hardest part is to make sure that everything continues to work as before, at least for the examples we distribute. It might also be prudent to document this change on the main synoptic homepage, or in some form of release notes?

ohmann commented 8 years ago

One option I considered is to deprecate the current -r argument (and state so) and to add a new argument (e.g., -R) that becomes the preferred one and does not mess with whitespace. This might seem messy, however, so updating everything is perhaps the better option. Adding to the usage message a warning that the behavior has changed might be useful if any current Synoptic users currently rely on the old behavior, sometimes update Synoptic, but don't regularly check the online documentation.

On Thu, Sep 17, 2015 at 4:41 PM Ivan Beschastnikh notifications@github.com wrote:

This is an important usability bug to fix. However, note that you'll have to fix existing args.txt files in the repository, as well. I think this is worthwhile, but the hardest part is to make sure that everything continues to work as before, at least for the examples we distribute. It might also be prudent to document this change on the main synoptic homepage, or in some form of release notes?

— Reply to this email directly or view it on GitHub https://github.com/ModelInference/synoptic/issues/397#issuecomment-141222326 .