Election-Tech-Initiative / electionguard-python

A python module implementing the ElectionGuard specification. This implementation can be used to conduct End-to-End Verifiable Elections as well as privacy-enhanced risk-limiting audits.
https://www.electionguard.vote/
MIT License
161 stars 96 forks source link

🗑 Remove delimiters in hashes #536

Open lmarie79 opened 2 years ago

lmarie79 commented 2 years ago

Feature Request

Description @benaloh proposed to not use any delimiters between parameters of hashes. The idea would be simplify the hash itself. It also removes confusion on whether any delimiters are used between parameters of hashes.

Another option would be to be more clear of exactly what the delimiters are and how they are used in documentation.

Proposed Solution

This would involve removing the delimiters shown in these lines: https://github.com/microsoft/electionguard-python/blob/e35b68dc3175243ffaf872e9a0ea56c5b8a05830/src/electionguard/hash.py#L73

https://github.com/microsoft/electionguard-python/blob/e35b68dc3175243ffaf872e9a0ea56c5b8a05830/src/electionguard/hash.py#L100

keithrfung commented 2 years ago

This is marked as question because we are unsure whether to implement this despite it being simple fix.

benaloh commented 2 years ago

We need to be a bit careful here. We can and should remove delimiters wherever possible, but we must make certain that parameters are always of predetermined length. We have to avoid "no table" and "not able" looking the same.

For most of our hashes (especially, generation of challenges), we are hashing a fixed sequence of fixed-length arguments. Here we can just drop the delimiters and concatenate. For less constrained data (e.g., the ballot coding file), we either need delimiters or the file itself should specify the lengths of each argument.

danwallach commented 2 years ago

I'm not sure I like nuking the delimiters. Consider that we're currently converting everything to base16. That means we can easily construct an ElementModP for which a corresponding number of ElementModQ values, concatenated together, have the exact same hex digits. The delimiters prevent these from hashing identically.

If anything, I'm concerned that strings are passed through to the hash function without any escaping. That's only an issue if a string can be externally specified. The solution would be use the kind of string escaper that shows up when encoding for JSON or for CSV.

benaloh commented 2 years ago

The concern above is precisely why the arguments must be of known fixed-length. If the input to a hash is an ElementModP, then one can't swap it with a sequence of 16 ElementModQ values and have it be a legitimate input.

Having identical inputs of different forms is a concern even with delimiters. One could say that "A|B" with A and B as numbers could collide with a hex string when the delimiter "|" is written as its ASCII equivalent "7C". We must always check that the inputs are of proper form regardless, and once we do, we no longer need delimiters for known fixed-length inputs.