Potential refactoring of `TwoToOneCRH`

weikengchen commented 3 years ago

Our current TwoToOneCRH takes left_hash_bytes and right_hash_bytes as input. As you can see, it has a strict requirement that both left and right must be bytes.

This is unsatisfactory for Poseidon, as evidenced by the fact that @drewstone and @filiplazovic have been working hard to bypass this limitation.

One potential solution is to make sure that TwoToOneCRH is merely a compressing CRH, and its inputs are of the same format as its output:

pub trait TwoToOneCRH {
    type Output: ToBytes
        + Clone
        + Eq
        + core::fmt::Debug
        + Hash
        + Default
        + CanonicalSerialize
        + CanonicalDeserialize;
    type Parameters: Clone + Default;

    fn setup<R: Rng>(r: &mut R) -> Result<Self::Parameters, Error>;

    /// Evaluates this CRH on the left and right inputs.
    fn evaluate(
        parameters: &Self::Parameters,
        left_input: &Self::Output,
        right_input: &Self::Output,
    ) -> Result<Self::Output, Error>;
}

This one will work, except for one other limitation that we previously discussed: this prohibits the use of more than one hash function, of different types. We refactor the Merkle tree all for this!

An example use case comes from Fractal, where ideally the bottom level of the Merkle tree is some sort of hash function that can be efficiently computed outside SNARK, e.g., Blake2b/s or another Poseidon.

So, we have two needs that are in conflict:

(1) provides a general framework such that different kinds of CRH can interact, e.g., making inputs [u8] and making outputs ToBytes.

(2) provides an efficient framework for CRH to not go through [u8] all the time.

Proposed solution 1: making `TwoToOneCRH` dual-interface

An alternative interface is to support both:

pub trait TwoToOneCRH {
    /// The bit size of the left input.
    const LEFT_INPUT_SIZE_BITS: usize;
    /// The bit size of the right input.
    const RIGHT_INPUT_SIZE_BITS: usize;

    type Output: ToBytes
        + Clone
        + Eq
        + core::fmt::Debug
        + Hash
        + Default
        + CanonicalSerialize
        + CanonicalDeserialize;
    type Parameters: Clone + Default;

    fn setup<R: Rng>(r: &mut R) -> Result<Self::Parameters, Error>;
    /// Evaluates this CRH on the left and right inputs.
    ///
    /// # Panics
    ///
    /// If `left_input.len() != Self::LEFT_INPUT_SIZE_BITS`, or if
    /// `right_input.len() != Self::RIGHT_INPUT_SIZE_BITS`, then this method panics.
    fn evaluate(
        parameters: &Self::Parameters,
        left_input: &[u8],
        right_input: &[u8],
    ) -> Result<Self::Output, Error>;

    /// Compress two CRH outputs.
    fn compress(
        parameters: &Self::Parameters,
        left_hash: &Self::Output,
        right_hash: &Self::Output,
    ) -> Result<Self::Output, Error>;
}

where compress is preferred except when touching the bottom layer.

This is still suboptimal, as there is still an unnecessary conversion between the leaf and non-leaf layers, even if they are both Poseidon.

Proposed solution 2: wrapping the `LeafHash` to match the `TwoToOneCRH`

We can provide a wrapper, such that when it wraps a LeafHash, it matches the LeafHash's output to the TwoToOneCRH's output.

Then, in the Merkle tree, we can require LeafHash's output = TwoToOneCRH's output, and if LeafHash is of a different format, we use a wrapper, like:

BytesToFieldElementWrapper<LeafHash>

If the LeafHash and TwoToOneCRH are the same, this wrapper is not needed.

Thoughts? @ValarDragon @tsunrise

weikengchen commented 3 years ago

A variant of proposed solution 2 is the wrap in a different way:

Let MerkleTree specify three hashes: LeafHash, TwoLeavesToOneHash, TwoHashesToOneHash.
The middle one is required to take the output of the LeafHash as input and outputs the output of the TwoHashesToOneHash. This can be done via a wrapped version of TwoHashesToOneHash, which does some conversions.

This would add a little bit more complexity, as we will need to specify the type of Input in CRH, which we haven't done before.

However, this would also allow more generality if in the future, we may treat the layer right above the leaves differently.

tsunrise commented 3 years ago

Just to confirm: For field-based CRH like Poseidon as two-to-one CRH:

LeafHash: Input: Vec<u8> (required), Output: any
TwoLeavesToOneHash (like an wrapper): Input: LeafHash::Output (required), Output: TwoToOneHash::Output (required) (we convert LeafHash::Output to F)
TwoHashesToOneHash (Poseidon): Input: F, Output: F

For byte-based CRH as two-to-one CRH:

LeafHash: Input: Vec<u8> (required), Output: any
TwoLeavesToOneHash (like an wrapper): Input: LeafHash::Output (required), Output: TwoToOneHash::Output (required) (we convert LeafHash::Output to bytes)
TwoHashesToOneHash (Poseidon): Input: Vec<u8>, Output: Vec<u8>

Is that correct? This idea sounds good to me

tsunrise commented 3 years ago

Do you want me to implement this? I have some spare time this summer @weikengchen

weikengchen commented 3 years ago

Please go ahead. This week is a little bit special in that I am very swamped.

tsunrise commented 3 years ago

Just realized this PR is also related to my BCS implementation. Will start implementing it this week

Pratyush commented 3 years ago

An alternative design is to have TwoToOneHash to have two kinds of inputs: "hash_bytes" and "hash_field_elements" (similar to what we have in the sponge API), and to constrain the Output type to implement ToBytesGadget and ToConstraintFieldGadget.

Additionally, you can have an an associated constant const SNARK_FRIENDLY: bool (we can decide a better name) which is set to true for arithmetization-friendly hashes like Poseidon. Then, the Merkle tree code can use this flag to decide whether to serialize to bytes or to field elements.

weikengchen commented 3 years ago

Yes, Tom and I have some internal discussion and we are likely going toward that direction (due to difficulty relating to enforcing the bound LeafCRH::Output == NonLeafCRH::Input in Rust)

tsunrise commented 3 years ago

That makes sense. I will try changing the existing design.

Writing wrapper for CRH constraints is a nightmare. I end up getting around 7 generics for one type and the fully associated syntax is a mess. Hopefully, using hash_bytes and hash_field_elements interface will address this problem.

Pratyush commented 3 years ago

Yeah I think we should try to reduce generics where possible (it's painful to write gadgets with them, as I'm sure all of you have experienced)

arkworks-rs / crypto-primitives