emancu / toml-rb

A parser for TOML using Citrus library.
MIT License
105 stars 39 forks source link

Problem with parsing literal string regex #75

Closed funzoneq closed 9 years ago

funzoneq commented 9 years ago
require 'toml'

stream = <<-EOS
# <28>Jan 14 14:49:55 6.14.2-4550-TEST craftd[1264]:  Minor alarm set, BGP Routing Protocol usage requires a license

[UdpInput]
address = ":514"
decoder = "syslog_transform_decoder"
test = 'liter\a/l'

[syslog_transform_decoder]
type = "PayloadRegexDecoder"
match_regex = '^<(?P<Pri>\d+)>(?P<Timestamp>\w{3}\s+\d+\s+\d+:\d+:\d+) (?P<Hostname>[^\s]+) (?P<Process>[\w\/]+)\[(?P<Pid>\d+)\]:\s+(?P<Message>[^\n]+)'
timestamp_layout = "Jan _2 15:04:05"
EOS

TOML.parse(stream)

Gives the following error:

TOML::ParseError: Failed to parse input on line 9 at offset 14
match_regex = '^<(?P<Pri>d+)>(?P<Timestamp>w{3} +d+ +d+:d+:d+) (?P<Hostname>[^ ]+) (?P<Process>[w/]+)[(?P<Pid>d+)]: +(?P<Message>[^

              ^
    from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml/parser.rb:15:in `rescue in initialize'
    from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml/parser.rb:11:in `initialize'
    from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml.rb:30:in `new'
    from /Library/Ruby/Gems/2.0.0/gems/toml-rb-0.3.8/lib/toml.rb:30:in `parse'
    from (irb):78
    from /usr/bin/irb:12:in `<main>'
emancu commented 9 years ago

@funzoneq Thanks for reporting this.

I found the error, it is because the \n is not escaped on your regular expression. I'm not sure if this is an error or not. So give me a little of time for this.

FYI there is a channel on freenode #toml-rb where you can find me

emancu commented 9 years ago

If you read it from a file it works. So probably it is an issue or is weird behavior on ruby's strings.

emancu commented 9 years ago

Could you tell me your expected hash ? (Only the regular expression needed)

funzoneq commented 9 years ago
heka = { "syslog_transform_decoder" => { "match_regex" => '^<(?P<Pri>\d+)>(?P<Timestamp>\w{3}\s+\d+\s+\d+:\d+:\d+) (?P<Hostname>[^\s]+) (?P<Process>[\w\/]+)\[(?P<Pid>\d+)\]:\s+(?P<Message>[^\n]+)' }}

would output:

{
"syslog_transform_decoder"=> {
"match_regex"=>"^<(?P<Pri>\\d+)>(?P<Timestamp>\\w{3}\\s+\\d+\\s+\\d+:\\d+:\\d+) (?P<Hostname>[^\\s]+) (?P<Process>[\\w\\/]+)\\[(?P<Pid>\\d+)\\]:\\s+(?P<Message>[^\\n]+)"
}
}
emancu commented 9 years ago

@funzoneq The issue is you were using <<-EOF.

Look at this example


irb> a = %q(match_regex = '^<(?P<Pri>\d+)>(?P<Timestamp>\w{3}\s+\d+\s+\d+:\d+:\d+) (?P<Hostname>[^\s]+) (?P<Process>[\w\/]+)\[(?P<Pid>\d+)\]:\s+(?P<Message>[^\n]+)')
=> "match_regex = '^<(?P<Pri>\\d+)>(?P<Timestamp>\\w{3}\\s+\\d+\\s+\\d+:\\d+:\\d+) (?P<Hostname>[^\\s]+) (?P<Process>[\\w\\/]+)\\[(?P<Pid>\\d+)\\]:\\s+(?P<Message>[^\\n]+)'"

irb> TOML.parse a
=> {"match_regex"=>
  "^<(?P<Pri>\\d+)>(?P<Timestamp>\\w{3}\\s+\\d+\\s+\\d+:\\d+:\\d+) (?P<Hostname>[^\\s]+) (?P<Process>[\\w\\/]+)\\[(?P<Pid>\\d+)\\]:\\s+(?P<Message>[^\\n]+)"}
funzoneq commented 9 years ago

Ok, weird. Thanks for the clarification.