Ballot grouping data? - Githubissues

simberaj commented 3 years ago

As John Karr pointed out in the EM-list, it is yet to be decided whether ABIF will support any notation for ballot grouping, to delimit precincts, constituencies, etc., and if so, what the format should be. He proposed the following notation:

!division: BRONX_PRECINCT_41 # there's been no discussion of this yet, I just picked ! for this example.
... lines from BRONX_PRECINCT_41
!division: QUEENS_PRECINCT_6
... lines from QUEENS_PRECINCT_6

I feel I don't have enough clarity around the benefits of in-file ballot grouping vs. using many ABIF files (that might be needed to be evaluated together) to have a clear opinion on this, but if ABIF is to support in-file ballot grouping, I'm in favor of using a special line start character to delimit it, per @robla's specs.

robla commented 3 years ago

I feel I don't have enough clarity around the benefits of in-file ballot grouping vs. using many ABIF files (that might be needed to be evaluated together) to have a clear opinion on this,

It seems to me that there's value in aggregating several voting precincts into a single file in a way that doesn't lose the metadata. I don't know if it's realistic, but I'd love to make it such that several ABIF files could be concatenated into a single file, and still result in a valid ABIF file. That would mean dropping some of the strictness about section ordering, but we would benefit from having something that could be handled by a stream-oriented parser.

It may be useful to have the top line of the format and the section divider have the same format for concatenation purposes. Then we could have something like this:

@ABIF - {"Division" : "BRONX_PRECINCT_41"}
... lines from BRONX_PRECINCT_41
@ABIF - {"Division" : "BRONX_PRECINCT_6"}
... lines from QUEENS_PRECINCT_6

We can then make the @ABIF line special file/section metadata, which applies to all lines following it. Our temptation will be to make the first line of the file more ornate, but I think the prospect of having files that have the similar format for the section breaks might temper our enthusiasm for making that line too complicated.

brainbuz commented 3 years ago

The other option I can think of would be to have a way of indicating it inline. If the : is only used as a divider, then placing division at the end could also work:

11: A>B>C=3>4: PRECINCT2 3: A>B>C=3>4: PRECINCT4

If there end up being more optional items it could be required to have empty fields when needed or a key could be included in the field.

Or the order could be different count : precinct : ballot , when there is no precinct 2 colons count :: ballot

brainbuz commented 3 years ago

After chewing on this for a while my current thought is

count : ballot : { Extended data } Where the extended data might be delimited such as by { } or it could just be some key value format. Embedding JSON might be a good choice.

Reader/Parsers that aren't interested in extended data can simply discard everything after the second colon.

Someone evaluating different methods against for example this year's NYC primary data likely won't be interested in the extended data. Other people looking for deeper trends and patterns need it for slicing and dicing: who won which borough, matching census data to precincts to see how demographics played, differences between mail in and at polls votes etc.

As a programmer I like parsing the extended data as JSON, but for human interaction that is less appealing.

simberaj commented 3 years ago

I like the notion that optional line-specific extended data would be stored after a second colon. I think we should explore some use cases for the extended data before we agree on their format; but if we want to store extension data not covered by ABIF semantics, I would go for NDJSON since it is flexible and easy to parse.

electorama / abif

Ballot grouping data? #13