Discussion: Re-Implement code-generation for test-cases

rainij commented 2 years ago

I feel that the current code-generator for the test-cases (bin/generate) is really hard to read. I observed this in a previous PR (https://github.com/exercism/sml/pull/193) where I had to adjust some things to make the code generator work again. I propose a re-implementation of the code-generator.

I would like to use this issue to decide if this would be a good move (as already said: I think so). Moreover I would like to use this issue to gather some ideas what could be done. How do other tracks generate their code (if they do so)?

In the past I used the python library jinja2 (https://palletsprojects.com/p/jinja/) for various things, including code generation for C++ (very simple code). So jinja2 might be an option and I have the feeling that this would achieve the goal that the code-generation is easier to maintain. On the other hand jinja2 is not a standard-package. Moreover, I have no deep experience in code generation and much more powerful/nicer/more elegant/more general solutions might exist.

rainij commented 2 years ago

@ErikSchierboom @kotp do you have an opinion on that issue?

ErikSchierboom commented 2 years ago

Sure. There are definitely other tracks that generate their code (I've written two generators: one for C# and one for F#). Here are links to some implementations:

And there are probably many more. I don't think there's a general, elegant solution, as much of it depends on the track that the tests are generated for. Some differences between test generators:

Code-based text generation vs. template-based text generation
Generator project per exercise vs generator project for all exercises

I don't have a strong preference for any of the options, it is mostly what works well for you.

As for using jinja2, 🤷 The thing I would consider is whether the generator would be easy to work with for other SML contributors. E.g., if jinja2 requires Python to be installed, that might be less convenient for people working on Windows. For that reason, most generators are written in the track's language, which is what I would recommend doing unless not doable.

I'd also highly recommend writing some accompanying documentation, as that can be invaluable to have external contributors add exercises later.

kotp commented 2 years ago

The Ruby track uses Ruby to generate the code, I believe Python uses Python to generate their code. It would be good for the language track, where practical, to use the language of the track (because those that maintain are likely going to be familiar with that track language). But I know sometimes that is not practical, and it is easier to do in another language. Imagine writing a test generator in assembly, for example.

I would look at the Python track, currently, (Ruby generators are broken, went out of date some time ago, due to file movements).

(and I just refreshed and found that there is a good response to show the various generators)

rainij commented 2 years ago

@ErikSchierboom the thing with python is that the sml-track already uses python (for the current generator). The only minor additional complication is that jinja2 is not in the standard-library. So installing python is already required by now. At least on linux jinja2 could be conveniently installed in a virtual-environment (generated by python -m venv <some-folder-name>). I suppose on Windows this would be equally easy (once python is installed).

Yes I like good documentation too.

I will have a look into the question whether one should generate SML via SML-code. At the moment I am not that familiar with SML (still learning it, partially with the help of exercism :) ). As for my knowledge the answer could range between "yes of course, super-easy" and "no, don't do that!" :D.

@kotp thanks for the tip with the python-track. They actually use jinja2 so that might be a good place for inspiration.

If nobody else opens a PR I might provide one. But before, I do some research about what might be the best option. Maybe I come back here again with a proposal. But in any case it might take some time until I have time for this. And of course I don't want to own that problem if anybody reading this wants to do it he/she can just do it.

kotp commented 2 years ago

I would not consider the requirement to install python a "feature" though. It is another dependency that for the track may be something we do want to do away with. It would be similar if I had taken the time to write a generator in Ruby for the track, but would encourage and understand the want to move to an SML solution.

We have plenty of people that have backgrounds in different languages, but everyone here has the one thing in common, and that is the SML language, so that is something that we can and should count on.

rainij commented 2 years ago

@kotp ok sounds reasonable to me. For the reason you mention but probably also from a didactic point of view. And it is good to point out that it is desired to use the language of the track everywhere (where it makes sense). I did not know that, I just took the setting with python (and this strange Makefile) for granted.

I definitely will have a look if the generator could be rewritten in SML (Poly/ML). Might be fun to do it, but would take some time for me (if I would do it). Besides code generation a minimal requirement would be that one can efficiently implement a command-line parser.

There is also this fetch-configlet script. Don't know if Windows has bash? This would be a candidate for a rewrite too.

kotp commented 2 years ago

Bash is available for Windows™, in various forms.

rainij commented 2 years ago

@kotp is Bash preferred over Python on Windows?

kotp commented 2 years ago

In my opinion, the preference would be:

the language of the track
the language that many tracks use, if it is not the same language as the track

I always used batch programming on Windows, when I was doing that sort of thing every day. But my knowledge of what is common on Windows is not up-to-date, not even close, by decades.

ErikSchierboom commented 2 years ago

@rainij For Windows, we provide a PowerShell version of the fetch-configlet script: https://github.com/exercism/configlet/blob/main/scripts/fetch-configlet.ps1

rainij commented 2 years ago

@guygastineau we already talked about this issue here https://github.com/exercism/sml/issues/134. It would be nice if you could quickly recall your proposal here (just to have it at the right place) - the thing with the json lib, submodules and thoughts about alternatives to polyml. It sounds good too me, but maybe somebody here has some further suggestions.

If you could implement it, I would be pleased to do the first review :slightly_smiling_face:. Would that be OK for you @kotp @ErikSchierboom?

guygastineau commented 2 years ago

Certainly, I used the testlib.sml from this track as a quick test library for helping a friend test their SML homework using SMLNJ. I didn't have to change much. I just made a generic test function with dependency injection to allow custom comparison and printing functions to be supplied. This let me drop the PolyML.makestring calls in equalTo. That is the only thing making the test suite non-portable, so I figured, "hey, I should make a PR, so the track will work for other implementations of SML."

Well, then I realized I should update bin/generate as the responsible thing to do. But I don't want to write python. I want to write SML, so I thought, "alright, I'll rewrite the generator in SML." So, then I realized there was no easily accessibile library for http requests and JSON parsing. So, being a sensible person I set out to write an HTTP client library for SML from scratch... Well, life gets in the way of plans like that. I ended up writing a small part of that before I got distracted implementing Sets and Maps for SML with my own take on this paper, Efficient Sets.

So, I got overly ambitious and fairly distracted before more pressing matters took over my spare time for SML. That is as far as I got rewriting bin/generator :upside_down_face:

This time around, what I would do differently:

I have found a decent looking json parser/printer https://github.com/diku-dk/sml-json. I have a branch set up already where this is added as a git submodule. This should let us easily parse the JSON from problem spec, and it could come in handy for a test-runner later.
For http requests we might as well shell out to curl. I think curl comes on most OSs these days. Most linux distros, Windows, MacOS?, FreeBSD. I think it is important to consider dependency requirements deeply, but curl seems pretty ubiquitous to me.
Then just rewrite it, and actually get the work done :)

Really, the script isn't doing that much. I expect we can achieve greater clarity with SML too. Branching on quality pattern matching will be much nicer than the way the python has to deal with it. IO might give us a little more boiler plate than the python, but there is probably an opportunity to make a mini FileIO library that can be used by various SML scripts/tools for the track; so I don't think that is bad.

Anyway, hopefully this let's you know what I was thinking about the test generator from before. Really, it would do the same analysis (hopefully better since I believe MLs are better), and it would output tests that are almost the same. The difference is that we would use monomorphic versions of equalTo like equalToString, etc. equalTo would take a comparator and a string builder for a given type. So, this means we could output tests even when we don't have the necessary monomorphic equalTo in our testLib. Whoever ports the exercise can add a comparator and string builder if there are types that get defined in test.sml like in a port for zipper or something.

ErikSchierboom commented 2 years ago

I have found a decent looking json parser/printer https://github.com/diku-dk/sml-json. I have a branch set up already where this is added as a git submodule. This

Is a git submodule necessary? Not that I object to that, but git submodules are not always intuitive to work with, especially for any potential contributors. The library's page mentions that the sml package manager can also be used: https://github.com/diku-dk/sml-json#use-of-the-package Is that an option?

For http requests we might as well shell out to curl.

This is likely just fine. I'd recommend a nice error message when curl could not be found and maybe some documentation.

In general: rewriting it to SML sounds good!

guygastineau commented 2 years ago

@ErikSchierboom I have some reservations about using git submodules for it too. I have to use them frequently, but they can be a pain with less experience.

I think I mistook part of the sml-pkg documentation to mean it requires using .mlb (used by mlton and MLK it), but on a second reading of their docs I think that is not the case. I will look into using it for dependency management.

guygastineau commented 2 years ago

I have been working on this some, but I got side-tracked with other PRs and work. I also got all excited and implemented streaming MD5 in native SML for #110, but my digests are coming out wrong :/

Anyway, maybe smlpkg doesn't really require using .mlb files. I think a lot of libraries are distributed with .mlb and maybe .cm files. Recently, I am continuing work where everything can be built using poly, smlnj, and mlton. poly and mlton are great, but smlnj requires extra type annotations sometimes. I don't know why it's inference is less powerful than the inference for the other two implementations.

Anyway, I wrote a small poly script for using .mlb files, but it didn't handle the full syntax, and I found it is easier to maintain a separate build.poly file that has all the necessary use statements. use is a function, so there is a lot of potential, but in the end the simple thing was is the easiest to manage. Maybe we will just need to maintain some <libname>.poly files for any smlpkg dependencies that we use. I will continue looking into it.

exercism / sml

Discussion: Re-Implement code-generation for test-cases #204