MLanguage / mlang

Compiler for the M language, used to compute the income tax of French taxpayers
https://mlanguage.github.io/mlang/mlang/index.html
GNU General Public License v3.0
176 stars 9 forks source link

IRJ files parser as an Mlang sublibrary #222

Closed mdurero closed 10 months ago

mdurero commented 1 year ago

Parser for DGFiP test files, AKA "fip files", ".irj", "creneau files", "autotest files"… to load data in a straightforward OCaml record.

Compatible with both actually used file formats right now, primary and corrective computation, and cleaned from unidentified format variations.

Parser is based on original Mlang interpreter parser.

To be used by Mlang interpreter, Mlang OCaml backend, or any external OCaml application.

denismerigoux commented 10 months ago

@noeensarguet can you push your changes here so that I can review them?

noeensarguet commented 10 months ago

@noeensarguet can you push your changes here so that I can review them?

Done This was mostly reorganizing the directories and including the appropriate dune files to have an independent irj-parser opam package - however I'm not sure that this is very satisfactory as is, we're definitely reaching the limits of flattening the directory structure with the stanza (include_subdirs unqualified)

denismerigoux commented 10 months ago

So what you did here is pull all the dependencies of parser_utils.ml into irj_parser but I don't think we can merge this as it destructures the architecture of Mlang completely. The files :

Should remain where they are, hence irj_parser should not depend on them. Hence you should refactor the code of irj_parser/{main.ml, irj_ast.ml, irj_file.ml, irj_parser.mly} to not depend on anything in the list of files above, so that irj_parser can really be a standalone library that does not depend on Mlang. For instance, instead of throwing error messages with Cli.raise_error, the irj_parser library should throw an OCaml exception it defines. It should also embark its mini-AST of the test files.

Keryan-dev commented 10 months ago

This is a bit of a tangent.

I agree that depending on MAst is too much, and that errors should be handled in a more library-friendly way (i.e. exceptions or result types). Pos type and functions could be reused but a different minimal version should be enough for the current purpose.

This aside, I would strongly advocate for a much more librarified architecture for Mlang as we may use some of it to build other helpful tools.

denismerigoux commented 10 months ago

I agree with you Keryan, we should move to an Mlang architecture where basically each intermediate representation is its own separate library with clear dependence flow between them. That being said I think this PR is a first step toward this : if when this PR is merged irj_parser becomes its own separate library that doesn't depend on the rest of Mlang, we're progressing toward a more modular architecture no?

Keryan-dev commented 10 months ago

Yes of course. What I meant is that I would not be shocked to see some adjacent libraries depend on part of the compiler, this one included.

denismerigoux commented 10 months ago

Oh yes of course. For instance we'll surely have a utils library that all the intermediate representations, and maybe irj_parser if we want, could depend on.

noeensarguet commented 10 months ago

Should be better. Still quite unsatisfied with the (wrapped false) stanza in the irj_parser library, but this was necessary to open Irj_ast in module Test_interpreter. Also opened the definition of type Pos.t to be able to make it compatible with the type t in Irj_ast

Keryan-dev commented 10 months ago

I don't think you should need a (wrapped false) to be able to use those module elsewhere. Without it you'd have an additional indirection through the top module Irj_parser : open Irj_parser.Irj_ast didn't do the trick ?

noeensarguet commented 10 months ago

@Keryan-dev tried what you said and a couple other things to address the (wrapped false) - maybe I didn't quite wrap my head around this. If I take the stanza out, the only module I can open is Irj_file since it's the top module of the library Irj_parser (and open Irj_parser yields an Unbound module error). Neither open Irj_parser.Irj_ast or open Irj_file.Irj_ast are recognised as bound modules

Besides, if it is possible to add an indirection with open Irj_parser.Irjast, I don't quite see what the purpose of the stanza (wrapped ) would be in the first place - so if adding some indirection is an option, I would be down to have some clarifications about that :)