colis-anr / morbig

A static parser for POSIX Shell
Other
190 stars 8 forks source link

Split CST and derivings #126

Closed Niols closed 3 years ago

Niols commented 3 years ago

In the PR, we split the module containing the CST and the modules containing serialisers for the CST (to and from JSON) or the visitors for the CST.

Ideally, this would be done using something OCamly, like ppx_import. Sadly, it seems that it isn't maintained so much anymore and, anyway, it uses technologies that clash with the Dune way of building PPXes (used by visitors and ppx_deriving in the project). I am not aware of any other good solution for that. I therefore tried an approach based on the copy of the CST.ml file (which now only contains types and has been renamed CST.mli as well as some text manipulations. I am not necessarily pleased by this solution but it does seem to work. I would add that problems are mitigated by the fact that we compile the result of the text manipulations, which prevents us from breaking everything.

I believe it makes for a cleaner architecture of the Morbig library, because the CST module only contains types (as the name suggests) and the serialisers and visitors are hanging out in appropriately named modules. Additionally, it makes for a cleaner documentation and allows us to not build documentation for the visitors (which is unreadable anyways), therefore fixing #124 (the size of the documentation goes from ~500M to ~5M).

I keep this PR as a draft for now as I am pretty sure that the script would not work on Windows. I will test that shortly, once I fix the CI.

In the meantime, @yurug, do you have anything to say? Does the solution look reasonable to you or is it awful and should be avoided? Do you know of any other way to do the same thing?

Niols commented 3 years ago

I have now introduced #128 which is based on this PR but rewrites the splitter in OCaml instead of Shell. This does not make the compilation faster or clearer. It does allow for a better documentation of the process (in particular, the regexp can be split and documented in tiny bits). More importantly, it fixes the compilation on Windows, which we will see once we integrate the new CI (see #127).

We have to choose between this PR, #128 or none of the two if we don't validate the concept. @yurug, WDYT?

Niols commented 3 years ago

Closed in favour of #128.