Closed ranweiler closed 7 years ago
In the sample data we have (e.g. test/data/dominion/example/CvrExport.json
in the backend sub-project), we have ImageMask
keys in the JSON data for each Session
array item, which have absolute file paths as values.
Questions:
(TabulatorId, BatchId, RecordId, CountingGroupId)
, or just derive it from the index in the Sessions
array in the CVR export JSON file. What is preferable?@kiniry, assuming the answer to (1) is "yes", we've implemented a first pass at Dominion CVR manifest upload in #16, which has been merged into the workflow-assistant
branch.
We still need to answer the above questions to consider this closed. Depending on the answers, we may either (a) remove the ballot image upload card altogether (which is possibly superfluous if we have a CVR manifest), or (b) make it optional or use it to upload missing images.
I also assume the answer to (1) is "yes". Neal McBurnett and Harvey will know more.
The answer to (2) is "yes", as I just stated in email. We can pre-process manifests to adjust for image location for the moment. Later we can make this more auto-magic. This answers your question (3) too. I.e., if we can find assurance that image filenames are unique in a given election data set, then one could just have the user drag-and-drop or otherwise point to a directory in the filesystem and we can just lookup each image by trimming the path prefix and matching.
Knowing the answer to (4) is very much dependent upon getting precise documentation for Dominion's formats (a long short) or having a much larger sample set of manifests and CVRs. I have made this explicit request to the community for the latter.
Wrt (4), it looks to me like the filename of images is a fragment of the catenated tuple you suggest. I suggest we make the ADT and the current naming convention witnessed in examples match. I.e., (TabulatorID, BatchId, BallotId)
Thanks @kiniry!
To close for now:
Other features (e.g. in-band specification/update of file locations) can be opened as a separate issue as needed.
Arapahoe County reports:
Our paper ballots should be stored more or less in the same sequential order as the Record ID.
I've noticed:
"RecordId", which I had hoped would be just the sequence number we're looking for, is actually not presented in sequential order in the CVR file. Some records don't show up in sequence, and seem to be appended to the end of their batch.
It seems that when ballots are adjudicated (resulting in "Modified" session records, vs the usual "Original" session), they show up at the end of their batch in the json file, but since the paper ballots aren't looked at during adjudication, the paper should remain in order by RecordId. Thus the identifier to give auditors for ballot retrieval is (TabulatorId, BatchId, RecordId, CountingGroupId)
though so far it seems to me that the BatchIds are unique.
We should ask Dominion for advice, and perhaps also ask the community for what they want to see.
@nealmcb, @kiniry, I am going to move the display of the above Dominion-specific ballot identifier data to a new issue, and close this out after the above items are checked off.
I've made issues for some of the non-immediate followup work. Otherwise, this is closed by #37.
Right now, as a first approximation, we import ballots via selecting the relevant images on the filesystem. Instead, we should permit the upload of a CVR export file/manifest, and compute reported contest outcomes from the CVR data.