aanastasiou / adl_ebnf

openEHR ADL 1.5 EBNF description
GNU General Public License v3.0
2 stars 0 forks source link

What is the state of this project? #1

Open jure opened 10 years ago

jure commented 10 years ago

What is the current state of this project and how can I help?

The end goal for me would be to have a Ruby ADL 1.5 parser based on the generated EBNF.

aanastasiou commented 10 years ago

Hi Jure

The project is pretty much alive, especially after recent news that ANTLR now generates Python targets properly which is where I am trying to get at.

When I started working on this, my aspiration was to have a complete expression of ADL in EBNF. This part of the work is in: https://github.com/aanastasiou/adl_ebnf/tree/master/src/adl_ebnf

However, I soon realised that converting the EBNF to a (similar) language that a parser generator tool would use to produce the parser would take yet another layer of transformation. For this reason, I turned to ANTLR that has a metalanguage that is descriptive enough and similar to EBNF to have a complete expression of ADL into and can produce parsers for many different targets.

But, working with ANTLR on ADL from scratch revealed some data types and other expressions that were common across the different syntax expressions required for ADL, cADL and ODIN. Therefore, a little bit of re-structuring had to also take place while translating the rules from the existing resources.

The ADL and cADL "outer" syntax files are relatively straightforward to work with and their relatively "mature" versions can be found at: https://github.com/aanastasiou/adl_ebnf/tree/master/src/antlrDefs/adl

At the moment, there are two "big" tasks concerning this project: 1) Translate ODIN to ANTLR's metalanguage I have started this (odin.g4) but it's not going to be straightforward because of the complexity of ODIN and because some of these rules will need to be re-expressed to be compatible with the way ANTLR expects its definitions

2) Finalise the definitions of ADL and cADL. There are some more optimisations that can be done there and there are some rules that can be re-used rather than be re-defined (as is the current practice).

I am not sure about the infrastructure that Ruby has to express parsers. Python for example has the excellent Pyparsing module that can be used to put together parsers using an object oriented interface. In fact, some of my early work was based on an ADL 1.4 parser built using pyparsing (https://github.com/aanastasiou/adl_ebnf/blob/master/src/adl_ebnf/adl_1_4.py). That would lead to a huge and difficult to maintain piece of code for ADL 1.5 though. Is Ruby similar or is there an automated generator available? Have you worked with ANTLR before?

jure commented 10 years ago

Thank you for the insight, Athanasios!

The EBNF -> ANTLR reasoning is interesting, it's my understanding that EBNF is a subset of ANTLR. Is ANTLR just easier to work with than pure EBNF and/or gives you some features that ENBF is missing? Happy to be educated on these points, don't have a good idea of how these two interact/fit together.

Translate ODIN to ANTLR's metalanguage

Isn't ODIN just another serialization format to express archetypes? Is it necessary, or is parsing ADL files sufficient, but parsing ODIN would be nice to have for future compatibility?

I haven't worked with ANTLR yet, but it looks like the Ruby support is quite solid: http://antlr.ohboyohboyohboy.org/

aanastasiou commented 10 years ago

No worries, glad you are finding this useful, please see below:

...EBNF is a subset of ANTLR. You could say this effectively. But strictly speaking it's two different concepts.

What I really liked about ANTLR (and a feature that pure EBNF would be lacking) is the ability to include files and this is great when you are working on a project like ADL because you can break a huge project down into more manageable chunks. Another useful thing that is not inherent in EBNF is the ability to do conditional parsing through "modes". So, you can have a set of rules (possibly overriding already existing ones) that are activated upon encountering a '{' and de-activated upon encountering a '}' or other set of delimiting symbols.

EBNF is the standard notation to represent a context-free language (see for example http://gun.teipir.gr/VRML-amgem/spec/part1/grammar.html or even http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/ ). As a notation, it is as useful as a mathematical formula is. That's just that, the formula. The formula will not do integration or differentiation for you...you have to write code that does this.

ANTLR augments that with the ability to generate an LL recursive descent parser in some language and allows you to also insert inline code that modifies the parsing. ANTLR provides both the means to specify a language (i.e. the notation part, just like EBNF) and also produces a usable artifact from that specification.

Isn't ODIN just another serialization format to express archetypes? Is it necessary, or is parsing ADL files sufficient, but parsing ODIN would be nice to have for future compatibility?

It's all ADL. If you want to simply read the structure of an archetype then (at the moment) there is no reason to get involved with parsing the ADL, you can simply use the XML format.

There are parts of ADL that employ ODIN, for example from http://htmlpreview.github.io/?https://github.com/openEHR/adl-tools/blob/master/components/adl_compiler/src/syntax/adl/parser/adl_15_parser.html and wherever you have V_ODIN_TEXT.

In any case, ADL can do things that XML can't, like re-using existing 'class' definitions for example, so, strictly speaking, ADL is a whole different language for which a parser is a valuable tool to have.

haven't worked with ANTLR yet, but it looks like the Ruby support is quite solid:

Ah! That's ANTLR3. The definitions in this project are for ANTLR4. There are a few subtle differences between the two specifications. In ANTLR3, parser code was laid out inline with the language specification but in ANTLR4 it is kept separate (worth checking out: http://www.antlr.org/).

It would also be worth to check two very good references on ANTLR: http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference (must read) And http://pragprog.com/book/tpdsl/language-implementation-patterns (Not essential but definitely some very good points made in that book)

I hope this helps

jure commented 10 years ago

Thank you again! I think I'll need a day or two to process this information, as this touches a lot of things I'm not familiar with.