Closed yruslan closed 5 years ago
I can probably help with this. I just rewrote a parser I build some time ago with boost.spirit to use ANTLR. This is loosely based on the grammar at https://github.com/uwol/proleap-cobol-parser. Here's a repo with the code -- it successfully parses all the copybooks I have access to.
Great, Thanks! We will take a look at that.
Taking a stab at this issue on this branch. If you have a chance, could you take a quick look for some early feedback?
Wow, there are a lot of changes already. Will take a look. Thank you very much!
It currently passes all the tests for the cobol-parser except for the PIC 9(n)9 one (is this a valid pic?). Still looking into a few failed tests on the spark-cobol side.
Great, Thanks! Not sure if PIC 9(n)9
is valid, it was added for completeness, probably. I plan to work extensively on Cobrix starting from about the end of the next week. Will look at the rewritten parser. This is so cool that Cobrix can finally have a parser with the proper grammer definition!
The parser on my branch doesn't recognize that kind of PIC, but should be easy enough to add it. I've never seen it before, but my COBOL knowledge comes from the copybooks and data files I have access to.
We haven't encountered such a PIC either. So let's declare it invalid for now to make things a little bit easier.
All the tests pass on my branch except for Test6TypeVarietySpec. It seems that my logic for PICs with signs is different than what we have right now.
I was expecting EX-NUM-INT02
to be
Integral(9(8)+,8,Some(Right),true,None,None,Some(EBCDIC())))
but getting from the current schema
Integral(9(8)+,8,None,true,None,None,Some(EBCDIC()))
Is it the case that the +
in 9(8)+
or +9(8)
is just for display and not for storage?
The location of the '+' symbol should be for storage. I think this didn't came out as a bug because Cobrix removes all '+' symbols when parsing DISPLAY numbers. So if a '+' is located to the left or to the right side of the number does not matter.
But for the clarity of the parser I think the parser should return what you are expecting, e.g.
Integral(9(8)+,8,Some(Right),true,None,None,Some(EBCDIC())))
The current parser does it's job, but it is written without conforming to the usual practice of parsing.
We need to write a grammar for the subset of COBOL that we are parsing and generate the parser using a tool like
ANTLR
,JavaCC
, etc.