cucumber-attic / gherkin2

A fast Gherkin parser in Ragel (The parser behind Cucumber)
MIT License
382 stars 221 forks source link

Javascript parser can't process files with a Byte Order Mark character #300

Closed daverubert closed 10 years ago

daverubert commented 10 years ago

Hi, I'm working with cucumberJS on a Windows environment, and the .feature files saved with Visual Studio (which had a BOM as the initial character) fails with this error:

Error: Lexing error on line 1: '@web ... blah blah...'. See http://wiki.github.com/cucumber/gherkin/lexingerror for moe information.
    at Lexer.scan 
...

The same file without the BOM as the initial character works like a charm.

I think that the lastest version of the lexer deals well with this characters: https://github.com/cucumber/gherkin/blob/master/ragel/lexer_common.rl.erb#L14

But this seems not to apply to the javascript version. Could the javascript version of the lexer be updated to deal well with BOM characters? I have tried to generate the english parser by myself following the main page instructions but I have been unable. Any hint? Thanks.

References: https://github.com/cucumber/gherkin/issues/100 https://github.com/cucumber/cucumber-js/issues/144 https://github.com/cucumber/cucumber-js/pull/158

aslakhellesoy commented 10 years ago

This is a dupe of #100. For some reason the BOM rules in the ragel grammar are ignored, so it would be great if you could apply the same fix in gherkin (pre-parsing) as you suggested in cucumber/cucumber-js#158.

aslakhellesoy commented 10 years ago

I'm closing this since it's confusing to discuss this in many places. Please follow up discussion in #100 or send a PR.