FreeAndFair / OpenRLA

Free & Fair's open source RLA support software.
Other
4 stars 3 forks source link

Allow import of Dominion CVR export files #3

Closed ranweiler closed 7 years ago

ranweiler commented 7 years ago

Right now, as a first approximation, we import ballots via selecting the relevant images on the filesystem. Instead, we should permit the upload of a CVR export file/manifest, and compute reported contest outcomes from the CVR data.

ranweiler commented 7 years ago

In the sample data we have (e.g. test/data/dominion/example/CvrExport.json in the backend sub-project), we have ImageMask keys in the JSON data for each Session array item, which have absolute file paths as values.

Questions:

  1. Are "image masks" just ballot images / the exact files we would want to display to a user?
  2. Can we assume that, for a user of the RLA assistant, the absolute paths in the CVR export manifest will always point to valid files on the system on which we're executing? Is there any reason to copy the files to our own data directory?
  3. Should we even allow manual specification of ballot image locations, or is fully depending on the CVR export for these paths a feature, not a bug?
  4. It seems like there is no "sequence ID" or even a unique identifier across ballots. We can instead deterministically compute them from a tuple like (TabulatorId, BatchId, RecordId, CountingGroupId), or just derive it from the index in the Sessions array in the CVR export JSON file. What is preferable?
ranweiler commented 7 years ago

@kiniry, assuming the answer to (1) is "yes", we've implemented a first pass at Dominion CVR manifest upload in #16, which has been merged into the workflow-assistant branch.

We still need to answer the above questions to consider this closed. Depending on the answers, we may either (a) remove the ballot image upload card altogether (which is possibly superfluous if we have a CVR manifest), or (b) make it optional or use it to upload missing images.

kiniry commented 7 years ago

I also assume the answer to (1) is "yes". Neal McBurnett and Harvey will know more.

The answer to (2) is "yes", as I just stated in email. We can pre-process manifests to adjust for image location for the moment. Later we can make this more auto-magic. This answers your question (3) too. I.e., if we can find assurance that image filenames are unique in a given election data set, then one could just have the user drag-and-drop or otherwise point to a directory in the filesystem and we can just lookup each image by trimming the path prefix and matching.

Knowing the answer to (4) is very much dependent upon getting precise documentation for Dominion's formats (a long short) or having a much larger sample set of manifests and CVRs. I have made this explicit request to the community for the latter.

Wrt (4), it looks to me like the filename of images is a fragment of the catenated tuple you suggest. I suggest we make the ADT and the current naming convention witnessed in examples match. I.e., (TabulatorID, BatchId, BallotId)

ranweiler commented 7 years ago

Thanks @kiniry!

To close for now:

Other features (e.g. in-band specification/update of file locations) can be opened as a separate issue as needed.

nealmcb commented 7 years ago

Arapahoe County reports:

Our paper ballots should be stored more or less in the same sequential order as the Record ID.

I've noticed:

"RecordId", which I had hoped would be just the sequence number we're looking for, is actually not presented in sequential order in the CVR file. Some records don't show up in sequence, and seem to be appended to the end of their batch.

It seems that when ballots are adjudicated (resulting in "Modified" session records, vs the usual "Original" session), they show up at the end of their batch in the json file, but since the paper ballots aren't looked at during adjudication, the paper should remain in order by RecordId. Thus the identifier to give auditors for ballot retrieval is (TabulatorId, BatchId, RecordId, CountingGroupId) though so far it seems to me that the BatchIds are unique.

We should ask Dominion for advice, and perhaps also ask the community for what they want to see.

ranweiler commented 7 years ago

@nealmcb, @kiniry, I am going to move the display of the above Dominion-specific ballot identifier data to a new issue, and close this out after the above items are checked off.

ranweiler commented 7 years ago

I've made issues for some of the non-immediate followup work. Otherwise, this is closed by #37.