Write a proper specification

robla commented 3 years ago

Folks who have been around the electoral reform community for a while seem pretty interested in this format, but as of 2021-06-06, there isn't really a specification. There's only a few wiki pages, some test cases and a few online discussions about the format. It's really similar to ad hoc formats that have been around for 25 years or so, but we should actually have a written record of what we're up to.

carlschroedl commented 3 years ago

I echo @simberaj' call for a BNF (or similar) specification

simberaj commented 3 years ago

I have attempted to write a parser implementation fully compliant with the current ABIF test suite. It can be found at https://github.com/simberaj/votelib/blob/abif/votelib/io/abif.py; the test suite is at https://github.com/simberaj/votelib/blob/abif/tests/io/test_abif.py. The loaders produce Python dictionaries used in the rest of the library to represent votes, mapping preferences to counts. Implementing the spec, I used a couple of assumptions that seem natural to me and IMHO should be included in the spec; I'm listing those below. Wherever I was not so sure, I'm opening a new issue in this tracker.

Disallow uncounted ballot lines (i.e. each ballot line should start with a vote count number, even if that is 1)
Whitespace is insignificant anywhere outside square brackets (i.e. full candidate names).
Scored votes (with slashes) should contain each candidate at most once.
Outside square brackets, only ASCII alphabetic characters ([A-Za-z]) are allowed as candidate tokens, i.e. no spaces.

Any comments on the implementation (as opposed to the spec itself) more than welcome in votelib's issue tracker at https://github.com/simberaj/votelib/issues/51!

robla commented 3 years ago

I also wrote a parser using Lark. It took me a little bit longer to write the parser than I liked, mostly due to me getting distracted with other matters rather than any problem with Lark. I'm happy with the direction this takes my work in. Lark is a Python library that uses EBNF as its input. Though Lark is specific to Python, the EBNF format is a language-agnostic format which satisfies the request for a BNF specification associated with ABIF.

The first version of the EBNF is available here: https://github.com/electorama/abif/blob/main/abif-v0.01.ebnf

simberaj commented 3 years ago

Hi @robla, I have gone through your EBNF finally. A great job indeed! It seems this could indeed serve as an important part of the specification of ABIF, but I have some questions regarding minor differences between it and my understanding as reflected in the votelib implementation:

The candidate token definition formats differ: yours seems to accept =A: [Vít Rakušan] while votelib expects [Vít Rakušan]: A, which is IMHO simpler while retaining the benefit of being able to determine the line type from the first byte
The ballot line seems to accept mixing of commas with other preference separators, like A, B > C = D. Is this desired and if so, what is the intended semantics?
The JSON spec seems to be a bit restrictive to me (even though we agree the accepted JSON should be a subset of NDJSON, I would at least allow non-string atomic values) while it seems to allow unquoted keys, which is not permitted by JSON; having said that, this is probably the least essential part of the spec for the time being :-)

Some formal remarks:

I would not use named constants for simple literals (COLON, ASTERISK, etc.) for clarity, but that might be my personal preference.
I would rename the abifline token to something reflective of the fact it might span multiple lines.

robla commented 3 years ago

@simberaj - Thanks for the reminder to work on ABIF! I've let myself get distracted by other matters (e.g. I started doing some serious Perl development for the first time in years). I think I understand what you're suggesting, but I think I want to follow the following steps before making changes:

Create https://github.com/electorama/abif/blob/main/testfiles/test015.abif , which expresses the invalid syntax
Add "test015" to https://github.com/electorama/abif/blob/main/abif_test.py , which will (initially) fail
Fix my test suite (and the abif.ebnf file) so that running pytest from the top-level directory doesn't return any failures. Of course, there will be invalid ABIF files in the "testcases/" directory, but that's kind of the point of a good test suite.
Declare the specification done, because the test suite is done! :rofl:

Seriously, though, I'll take a closer look at things later. I've got a few other things to take care of in my personal life that are interfering with my Python-scripting time, but I'll hopefully have more time very soon. We should also work out how issue #16 is going to work, so that I can start accepting pull requests.

electorama / abif

Write a proper specification #1