Configuration File Format

eayus commented 2 years ago

Currently we are using JSON for config files. However I don't think this is ideal as it is not the nicest format for humans (having to quote keys, no comments, etc.).

There are a few alternative options:

dhall - I like the idea, however in practice it seems to be quite verbose. For example, you need to specify a type annotation if you use an empty list. Furthermore, when I tried the Idris implementation, the error messages were very confusing.
toml - Probably my favourite language design-wise. However, the Idris implemetnation doesn't seem to be compiling atm (https://github.com/cuddlefishie/toml-idr). Also I have no idea how robust the implementation is, as I've been unable to test it.
others - I'm not aware of any alternatives with Idris support, but please let me know if there is some other options.

I'd say the biggest requirement on the format is that it should be easy to use for an inexperienced user. This means high quality error messages, and a simple to write format. The more complex the format, the better error messages needed to compensate.

Interested to hear others' thoughts on this.

alexhumphreys commented 2 years ago

Hi! As the author of I'm guessing the idris dhall implementation you're talking about, needless to say I'd vote dhall. But I do agree with your points about dhall.

It is pretty verbose, especially in cases like:

empty lists
Unions
Optional values Kind if just the way the language has evolved so far, and probably for the foreseeable future short of a Grace style rewrite.

So that's tough to change, but error messages are easier to improve. I've did some work over the summer to get source file positioning for errors, so you should get squiggly ^^ markers under the term that's faulty. But the text of the errors is still pretty shoddy. The dhall-Haskell implementation actually has some pretty good error messages. So it could be a matter of copying them from there to the idris implementation here, though there would be a bit of work aligning the two error types.

Parse errors still ain't great unfortunately, I'm not a pro at using Text.Parser.

So yeah, would need some work before it's ready for prime time, and worth noting it's also not standards compliant. But it's a fun little language, a fun compiler to work on. It also uses many similar techniques to Idris2 itself, so shouldn't be too tough for others to help out on should they so wish, as I've been a bit slow recently on working on it.

i ain't much of a toml fan, but there is another implementation here that should be relatively close to compiling if your looking for one to play with: https://github.com/idris-community/inigo/tree/main/Base/Toml. I hear it also ain't standards compliant, but then the toml standard is a lot less involved than the dhall one.

eayus commented 2 years ago

It's good to know that the error message can be improved for Dhall. I actually really like the idea of Dhall, and I think some of its features could be very useful for package files. For example,

If including multiple dependencies from a similar location (e.g. from Github), then a variable can be defined for that to shorten the URLs.
The common dependencies for multiple package definitions can be factored out (sirdi already supports the definition of multiple packages in the same file).

However, since this package manager is becoming "official", I'm very worried about usability for new users, and I feel like type annotations might confuse people (especially those who are new to Idris). Probably the best way to offset this kind of problem is by making sure that the "new project" command generates a complete config file with every field and it's type annotation.

It'd be interesting to hear others' thoughts on this too. With a decision like this, it's very difficult to go back and change to a non-backwards compatible format, and so it's nice to make the right decision in the first place.

JonathanLorimer commented 2 years ago

I personally like dhall. Can you not generate a type definition for the entire config schema when some runs sirdi new? Also I think its fair to assume some familiarity or interest in learning type systems, even from someone brand new to idris.

eayus commented 2 years ago

Can you not generate a type definition for the entire config schema when some runs sirdi new?

Yep this is definitely possible, and is probably the most robust solution if we were to choose Dhall.

It's looking like lots of people are enthusiastic about Dhall, so I'm swaying in that direction. We would definitely need to put in effort on improving parser error messages, but as long as people are willing to work on fixing that then hopefully that won't be an issue for long.

Kiiyya commented 2 years ago

Simplicity is key, I think TOML (or some other common markup language, YAML?) would be a better choice. But I may be biased, having been spoiled by Rust's cargo.

Either way, that's just the markup language. I think we should also consider the semantics of it. And again we should value simplicity. I would like to see the modules and main fields gone and using a "sensible defaults" and good conventions approach, similar to how Cargo.toml is structured.

eayus commented 2 years ago

cargo is definitely the gold standard for package managers. I keep flip-flopping back and forth on config formats since there are advantages for both approaches - it might just come down to whichever has the better library.

@Kiiyya for the modules and main fields, I'm not quite sure its a good idea to remove them. It'd be reasonable to make the main field optional and give it a default of a module called Main, but it would have to be overridable so existing projects can easily port to sirdi. For the modules field, I don't see a way to avoid it if you are writing a library - Idris doesn't provide any mechanism in the language to specify what modules should be exposed outside that package.

claymager commented 2 years ago

Sensible defaults is a good idea; removing them entirely is a harder sell. There are currently a number of projects that have multiple ipkgs for the same source, such as Idris itself.

I like dhall because it (in theory) allows us to be precise about the interface, but very clean parser code and good documentation probably accomplish the same goal.

I've tried using TOML for describing packages like idris2-jupyter-vega, which requires libraries which require a nonstandard executable and command, and it basically turns into Nix with a different syntax. I don't know if we'd even want to support something like that, but at least we can do some reasoning with dhall.

dagit commented 2 years ago

I think there's some different dimensions to consider in weighing options there. Let me throw out a few that come to mind.

Ubiquity of the format: If you pick json or yaml, then it's quite easy to find or make other tools for processing the packages. This may not seem important now, but as things catch on it can really help to have package descriptions be easily machine processable. It helps support automation for things like build or release scripts. Package repository management. Etc. On the flip side, if we make a custom notation or use dhall, it may be harder for people to write such things.
How well the notation fits the job: The advantage of a bespoke notation (think cabal), is that the syntax can be really concise. No need to quote certain things. The parser knows to expect a string. Parse errors can also be tailor to be very specific. On the other end of the spectrum we have pre-made notations like json/yaml/toml/etc. These tend to have generic linters and generic parse errors and so on. Although, they do have the advantage people may already be familiar with them from other things.
Error-proneness: One of the nice features of cabal and dhall is that you can group common bits and reference them as needed. For packages that define multiple similar things, like multiple executables, this can reduce the maintenance burden quite a bit. On the other hand, the linter won't exist unless you write it, unlike with yaml/json. Simplicity of the notation is another thing to consider. With the power of dhall also comes opportunities for beginners to get really confused and lost.

I'm not personally a huge fan of toml but I do think it sits at a pragmatic happy medium for a lot of these dimensions in terms of trade-offs. I'm a bit concerned about the choice of dhall for the following reasons:

Will choosing it delay the standardization of the config here? I don't think json is appropriate for human edited configs and so I hope it's not used for long.
The "common sections" thing I mentioned above is very nice but can probably be done in other formats as well. So I don't know if the extra complexity of dhall is worth it.
As far as I understand it, it would score pretty low in terms of ubiquity which wouldn't be an issue in the short term but it might be annoying down the road.

I'm partial to a bespoke notation, but when I try to factor out my personal biases, I think toml is probably the right choice for sirdi. It's certainly not a risky choice. I think bespoke and dhall have the highest potential to be interesting choices but also come with the most risk. As for json and yaml, I would say please use toml instead.

eayus commented 2 years ago

The TOML library (https://github.com/cuddlefishie/toml-idr) has been updated to work with the latest Idris2 version, so I'm going to have a play with that

eayus commented 2 years ago

Ok I think I've decided on TOML. I don't like to make "executive decisions" when the agreement is not unanimous, but I think this issue has been open long enough, and not having a concrete configuration file format is the main barrier to more serious adoption of sirdi.

My reasons are:

TOML seems to be the least controversial option. It might not be everyone's favourite, but it at least seems most people are fine with it. It seems to be a happy medium.
Keeping in mind the original goal of the project - simplicity. Options such as dhall and bespoke formats can be exciting, but they are inherently more complex.
TOML is very accessible for new users. It is already used for the same purpose in rust's cargo, and uses a common .ini-like syntax that many people will already be familiar with. It also has better editor support by default (for example, my default vim installation already supports TOML syntax highlighting).

sullyj3 commented 2 years ago

I like dhall, but I agree that it's important to use something ubiquitous and simple both for ease of beginners and machine processing. I agree with the choice of TOML.

It seems to me like people who want to use dhall should be able to use it as an extra layer to generate the TOML?

mattpolzin commented 2 years ago

Is this being actively worked on by anyone? Not sure I'll have time but if no one else is working on it anyway then it won't hurt if I tentatively plan to.

eayus commented 2 years ago

@mattpolzin I've been messing around with re-structuring sirdi, and as part of that I have written a TOML configuration file parser. I'll push the changes soon

eayus commented 2 years ago

I've pushed the WIP changes, including the TOML config parser, to therefactor branch. There is lots of intentional behavioural changes to sirdi, as well as unintentional regressed behaviour, but I felt the project could do with a decent rewrite as 90% of the code base was just hacked together in the developer meeting.

eayus commented 2 years ago

Closing this issue since once the refactor branch is merged we will be using TOML.

eayus / sirdi

Configuration File Format #3