BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
582 stars 161 forks source link

Add F# support via fslex and fsyacc #404

Open gdziadkiewicz opened 2 years ago

gdziadkiewicz commented 2 years ago

Five years ago I used bnfc (it's amazing, big thanks to all ppl involved in creating it) for the first time to generate frontend for my first compiler. Time issues didn't allow me to add F# support first and in the end, I manually adjusted the OCaml output to work with F#. For the five recent years, I made numerous attempts to finalize the F# support PR, but there was always something that made me fail and start from the beginning. Well, not anymore. Here I am sharing something that should have been shared a lot quicker.

As fslex is a port of ocamllex and fsyacc of ocamlyacc the code is based on OCaml code.

TODO list:

EDIT: Due to problems with building on GHC 9.2.1 I decided to remove interpolate. It (or string-interpolate) can always be added later, in separate PR, and used to refactor.

During the work, I plan to make notes of the differences between fslex/fsyacc and ocamllex/ocamlyacc for future reference.

gdziadkiewicz commented 2 years ago

@andreasabel what is your opinion on adding F# support via fslex and fsyacc?

From my point of view, it would bring back dotnet to the table for bnfc after removing C# and compared to gplex and gppg that were used in C# support, fslex and fsyacc are used to build F# compiler and as results are maintained by people maintaining the compiler, so the risk of them getting outdated, unsupported and hard to use with new dotnet versions is much smaller.

andreasabel commented 2 years ago

Hello @gdziadkiewicz, thank you for your initiative!

I was planning to evaluate your PR yesterday, but brew install nuget got me into a rabbit hole with brew occupying my CPU for 6 hours...

Let's start discussion anyway, here are some points:

  1. It would be nice to have a F# backend, together with a maintainer. For accepting a new backend I would require a maintenance pledge for at least 3 years. Some backends got removed short of a maintainer: C#, CNF, ... (In case of C# it was also the bad availability of the C# lex/yacc.)
  2. There used to be a F# backend, before my times, not sure how serious it was, but it got removed in 2.7: #16.
  3. Backends should share code to reduce the maintenance burden. BNFC has accumulated a lot of technical dept, due to careless forking of backend by cut-and-paste. Each change was very tedious to push through all the backends. I worked hard to undo these sins of the past, in particular by unifying C, C++/STL and C++/NoSTL as much as possible. This is still WIP, but for the parser, #351 accomplished it. (See also my requests on PR #389.) Cutting the story short, the direction I am heading is towards parametrized backends to share code where possible. In the case of F#, I think a lot of code could be shared with the OCaml backends. This could take different paths: Either to get F# by parametrizing the existing OCaml backend, or to get a new OCaml backend by parametrizing your F# backend, or to synthesize F# and OCaml to something new, taking the best of both worlds.
  4. With a student, I am working on a reimplementation, BNFC3. This project is still private. Backends will have to be ported to the new BNFC frontend (which is arguably better structured than the old one).
  5. There is a common testsuite for the backends in testing/ (start reading e.g. here). The F# backend would have to be added here.
  6. I might need some help with testing the PR, e.g. installing the prerequisites. What would be a good tutorial on FsLex/Yacc?

P.S.: Good you got rid of interpolate, its dependencies are too heavy, see https://github.com/sol/interpolate/issues/17.

gdziadkiewicz commented 2 years ago

@andreasabel

  1. I'm willing to join as a maintainer. How does a maintainer pledge look like?
  2. I kind of expect that it did make use of the fact that the early F# versions were able to compile a subset of OCaml. I will take a look and see if there is something worth salvaging there.
  3. That makes sense. I will check out the PRs you linked and investigate the possible approaches: a) parametrizing new F# backend b) parametrizing existing OCaml backed c) fusion Expect me to return here with input on them.
  4. Ok, if this (in one form or another) gets in you can expect me to work on/help with the porting when BNFC3 gets public.
  5. I will focus on that now (and on 6.). Before starting with the rewrite/refactor from 3. I would like to have "good" tests for the OCaml (or feel of them if they are already there) and the new F# thing.
  6. Sorry to hear that the experience of getting dotnet up and running on your machine turned out to be so painful. Expect me (tomorrow or a day after tomorrow) to provide you here with battle-tested instructions on how to get the necessary tool, compile the code and run it. The good part about it is that compared to the mono times it is a lot easier.

p.s. Having a reliable, maintained interpolation-providing package that's not deps heavy would be really nice. Did you like that neat-interpolation mentioned in the linked issue?

gdziadkiewicz commented 2 years ago

Differences between fslex/fsyacc and ocamllex/ocamlyacc:

  1. F# and OCaml have different std libs which means that boilerplate and usage of boilerplate (e.g. Hashmaps and using them) must be different (or unified by an adapter in one of them?)
  2. fslex does not support # (or did not in the past, confirmation with current version needed). Docs state that it does but real life and code strongly suggested that it does not. I will recheck that and open PR with an update to fslex docs if this is still the case.
  3. Current version of fsyacc requires some special dance (specifying module name as a parameter in fsproj file) around module declaration. This should be fixed in fsyacc and workaround in bnfc while it is still ongoing.
  4. Casing conventions for F# and OCaml are different, using OCaml conventions causes warnings and/or errors for F# (TODO elaborate)
andreasabel commented 2 years ago

@gdziadkiewicz worte:

1. I'm willing to join as a maintainer. How does a maintainer pledge look like?

Great! This pledge is nothing formal, it is the serious promise that you will maintain the F# backend for at least 3 years.

2. I kind of expect that it did make use of the fact that the early F# versions were able to compile a subset of OCaml. I will take a look and see if there is something worth salvaging there.

I don't really know if something can be salvaged, but I thought I point this out to you in case you want to do the archeology to see how things were done then.

3. That makes sense. I will check out the PRs you linked and investigate the possible approaches:
   a) parametrizing new F# backend
   b) parametrizing existing OCaml backed
   c) fusion
   Expect me to return here with input on them.

Great!

4. Ok, if this (in one form or another) gets in you can expect me to work on/help with the porting when BNFC3 gets public.

Excellent. I am hoping to make some progress on this early next year.

5. I will focus on that now (and on 6.). Before starting with the rewrite/refactor from 3. I would like to have "good" tests for the OCaml (or feel of them if they are already there) and the new F# thing.

Yes, the testsuite is the most important thing. This might also solve point 6, if you provide a script or instructions that install dotnet and the required tools.

To integrate the testsuite into CI is also a project I have been postponing for too long already, but maybe it can wait until BNFC3. I have not made any concrete plans how to migrate the testsuite, but I am resolved to migrate it one way or the other.

6. Sorry to hear that the experience of getting dotnet up and running on your machine turned out to be so painful. Expect me (tomorrow or a day after tomorrow) to provide you here with battle-tested instructions on how to get the necessary tool, compile the code and run it. The good part about it is that compared to the mono times it is a lot easier.

It is not the fault of dotnet, but the fault of my homebrew setup, so no need to apologize. But a pointer to get started would still be nice.

p.s. Having a reliable, maintained interpolation-providing package that's not deps heavy would be really nice. Did you like that neat-interpolation mentioned in the linked issue?

No, I passed by there, but did not find a "shopwindow" that would get me interested (see https://github.com/nikita-volkov/neat-interpolation/issues/32).

gdziadkiewicz commented 2 years ago
  1. I started working on adding f# backend to the existing tests and it looks like they already point out some problems with my implementation. WIP
  2. Requirements:

    dotnet cli tool and dotnet SDK for version defined in https://github.com/BNFC/bnfc/pull/404/files#diff-07ee170ba626d1934086f864c3ec7da48aa99c2de545fc2299e76298a50a13ffR122 which is net5.0

    How to get dotnet

    I recommend using https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-install-script for installing dotnet and the required SDK. It is portable, easy to use, does not require admin rights, works well with CIs. The only downside that I encountered is the lack of update tracking (which is something that for example installation via apt-get on Ubuntu provides).

    curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 5.0
    export PATH=~/.dotnet:$PATH

    Tested on my Ubuntu and Windows (in the dotnet-install.ps1 flavor). Hope to hear from you if it works on your OS.

    Usage:

    In the directory containing generated files:

    1. Run dotnet build (this compiles the code and from our perspective asserts that the generated code at least compiles)
    2. Run dotnet run -- <path-to-the-input-file> (this runs the compiled program (also recompiles if code changed), running it allows us the check if it terminates in reasonable time and if the output is as expected)
gdziadkiewicz commented 2 years ago

Work on getting all the existing tests passing still in progress. 23/30 of the parametrized tests are passing. During the process, I'm making notes on the fragile, easy-to-break parts of the code that should be replaced during the rewrite.

gdziadkiewicz commented 2 years ago

Differences between fslex/fsyacc and ocamllex/ocamlyacc:

  1. F# and OCaml have different std libs which means that boilerplate and usage of boilerplate (e.g. Hashmaps and using them) must be different (or unified by an adapter in one of them?)
  2. fslex does not support # (or did not in the past, confirmation with current version needed). Docs state that it does but real life and code strongly suggested that it does not. I will recheck that and open PR with an update to fslex docs if this is still the case.
  3. Current version of fsyacc requires some special dance (specifying module name as a parameter in fsproj file) around module declaration. This should be fixed in fsyacc and workaround in bnfc while it is still ongoing.
  4. Casing conventions for F# and OCaml are different, using OCaml conventions causes warnings and/or errors for F# (TODO elaborate)
  5. I investigated the adapter option and decided not to do it (it would bloat the generated F# code). I will go with parametrization of the boilerplate on BNFC side.
  6. I confirmed that this is still the case and decided to add the support instead of removing it from the docs.
  7. I will fix that in fsyacc.
gdziadkiewicz commented 2 years ago

I'm done with the tests and started working on the new version which parametrizes the OCaml backend.

andreasabel commented 2 years ago

Thanks for the PR! I built and tested it. Atm, two tests are not yet working.

The first is 266_define:

[TEST] Parameterized tests:F#:F#:266_define
§ Generate
bnfc -mMyMakefile --fsharp /Users/abel/project/open-source/bnfc/testing/regression-tests/266_define/test.cf
§ Build
dotnet build
error running: dotnet build
exit status: 1
stderr: 
@@@ Error! (3812ms)

This is probably due to missing sanitization, if the user uses let as a definition. I have fixed similar problems in the Ocaml backend.

testing/regression-tests/266_define/fsharp/AbsTest.fs(27,9): error FS0010: Unexpected keyword 'let' or 'use' in pattern [testing/regression-tests/266_define/fsharp/Test.fsproj]

The other test is 278_Keywords:

/regression-tests/278_Keywords/fsharp$ dotnet run -- ../good01.in
Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at FSharp.Text.Parsing.Implementation.interpret[tok,a](Tables`1 tables, FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 initialState)
   at FSharp.Text.Parsing.Tables`1.Interpret[char](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState)
   at ParTest.engine[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/ParTest.fs:line 8749
   at ParTest.pProgram[cty](FSharpFunc`2 lexer, LexBuffer`1 lexbuf) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/ParTest.fs:line 8751
   at TestTest.parse(TextReader c) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/TestTest.fs:line 15
   at TestTest.main(String[] args) in /Users/abel/project/open-source/bnfc/testing/regression-tests/278_Keywords/fsharp/TestTest.fs:line 30
andreasabel commented 2 years ago

@gdziadkiewicz Currently the generated Makefile only has the clean goal. Could it be extended by a

gdziadkiewicz commented 2 years ago

Yes, I will extend it.


From: Andreas Abel @.> Sent: Thursday, March 3, 2022 3:31:56 PM To: BNFC/bnfc @.> Cc: Grzegorz Dziadkiewicz @.>; Mention @.> Subject: Re: [BNFC/bnfc] Add F# support via fslex and fsyacc (PR #404)

@gdziadkiewiczhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgdziadkiewicz&data=04%7C01%7C%7C2a4927968da84f234e9308d9fd22922f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637819147181026242%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=afH1tGoVR%2F9SNdh0fx9pkDH0BmdLZG7IEQhOdIyBCCA%3D&reserved=0 Currently the generated Makefile only has the clean goal. Could it be extended by a

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBNFC%2Fbnfc%2Fpull%2F404%23issuecomment-1058100296&data=04%7C01%7C%7C2a4927968da84f234e9308d9fd22922f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637819147181026242%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8mfJPyQGXO65c11Y3lQ5rcbx3cWgdVdfp9G88tm%2FALQ%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACBG4D3XFWIV3F42T3KDJLLU6DENZANCNFSM5I74I6MQ&data=04%7C01%7C%7C2a4927968da84f234e9308d9fd22922f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637819147181026242%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5gtQr7GKLEC6yVpVRTuPMqS7Ab38jxCcwFEnpel8sUw%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

gdziadkiewicz commented 2 years ago

I'm sorry for the long silence. I added the missing goals, but have some questions about them as I rarely use make (I will add attach them to lines).