[feature request] creating a repository for metasyntax notations

Foadsf commented 3 years ago

The more I dig into the Elmer code base and the more I use the tool, the more I'm surprised that the development team is dealing with massive issues that is taking a lot of their time and effort. For example, the Elmer team is building a huge number of parsers:

ElmerGrid:
- .grd : ElmerGrid file format
- .mesh.* : Elmer input format
- .ep : Elmer output format
- .ansys : Ansys input format
- .inp : Abaqus input format by Ideas
- .fil : Abaqus output format
- .FDNEUT : Gambit (Fidap) neutral file
- .unv : Universal mesh file format
- .mphtxt : Comsol Multiphysics mesh format
- .dat : Fieldview format
- .node,.ele: Triangle 2D mesh format
- .mesh : Medit mesh format
- .msh : GID mesh format
- .msh : Gmsh mesh format
- .ep.i : Partitioned ElmerPost format
- .2dm : 2D triangular FVCOM format
ElmerSolver:
- .mesh.* : Elmer input format
- .sif : ElmerSolver input files
General
- MATC
- lua

This is really impressive but as a result a very complicated effort to maintain all these manually. While the developers should have actually focused on the core technology, numerically solving complex systems of PDEs. And Elmer's parsers are also very fragile. a very small typo can cause segmentation faults without further error messages. Small syntax mistakes might takes hours to debug.

My proposal to solve the issue is to use parser generators. For example, ANTLR4 seems to be an industry standard at the moment. What we need to do:

create a separate repository with .g4 lexer and parser grammar files for the above file formats / languages
use GitHub CI/CD to run ANTLR4 to generate C++ AST generators
develop an API to connect the above C++ parsers to the existing solver / algorithm
surgically remove the existing parsers into a separate repository and including them as .gitmodules for the time being

A nice and easy tutorial for ANTLR can be seen here.

raback commented 3 years ago

I agree partly. The situation that many parsers exists comes basically from the long history. There hasn't historically been any libraries to support this. So what we did was write a parser every time a new file format was needed. Often the parser has to be written in reverse engineering because many formats are not described anywhere. This and the fact that formats may even change adds to the challenge. Currently there is no major activity to write new parsers. Hence there are no resources used that could be moved to this new strategy. It would be great to see an all-to-all-formats auxiliary program would rise from the open source community. Now there seems to be parsers for -to-Elmer, -to-foam etc., but no generic tool

Foadsf commented 3 years ago

Thanks Peter @raback for the response. You touched upon some nice issues. Indeed, many of the used file formats have been changed, and keeping them up to date is not easy. Using off-the-shelve parser generators, we can also read other formats such as ANSYS's APDL and ABAQUS/CalculiX input files.

We don't have to do this overnight or all in once. We may start with the ElmerSolver's .sif files, as they, in my experience, cause the most headache at the moment. And then move on to the others. I will help if you support the idea. And I believe it will lead to a much more stable and maintainable parser.

P.S. Here in this repository you may see a collection of lexer/parser grammar files that we can start with.

ElmerCSC / elmerfem

[feature request] creating a repository for metasyntax notations #266