arkworks-rs / algebra

Libraries for finite field, elliptic curve, and polynomial arithmetic
https://arkworks.rs
Apache License 2.0
601 stars 235 forks source link

Decoding `BigUInt` causes unbound allocation #839

Open ggwpez opened 2 months ago

ggwpez commented 2 months ago

Decoding a BigUint internally decodes a Vec<u8> here:

https://github.com/arkworks-rs/algebra/blob/2ad2d84c40c145dcb7a447b4d5f64528f39fc1ad/serialize/src/impls.rs#L176

which then in turn calls Vec::with_capacity with the decoded length:

https://github.com/arkworks-rs/algebra/blob/2ad2d84c40c145dcb7a447b4d5f64528f39fc1ad/serialize/src/impls.rs#L523

This can cause a panic if that value is unreasonable large.

A possible test vectors:

#[test]
fn test_biguint_deserialize_bad_data1() {
    let data = [0xe9, 0xc7, 0x55, 0x82, 0x52, 0x038, 0xc1, 0x09];
    // Panics: memory allocation of 702904943871641577 bytes failed
    let _ = BigUint::deserialize_compressed(&data[..]);
}

#[test]
fn test_biguint_deserialize_bad_data2() {
    let data = [0xe9, 0xc7, 0x55, 0x82, 0x52, 0x038, 0xc1, 0x90];
    // Panics: capacity overflow
    let _ = BigUint::deserialize_compressed(&data[..]);
}

Honestly, i am not sure if this is even efficiently fixable. I have the same issue in another generic decoding library, and we are currently trying to have a special DecodeWithMemLimit kind of thing, that limits total allocations of a decode call.
The trivial approach of just checking len does not work, since types can be recursive. Nested vectors will inevitably overflow any artificial restriction.

Not sure what to do on untrusted input for production software, i would probably isolate the decoding into a separate process.

burdges commented 2 months ago

Is BigUInt: CanonicalDeserialize used anywhere? It'd mess up serialization lengths if used anywhere. And BigUInt looks largely internal in https://github.com/search?q=repo%3Aarkworks-rs%2Falgebra%20BigUInt&type=code

You found this fuzzing code we envision deploying? If unused, we could maybe deactivate BigUInt: CanonicalDeserialize using some feature?

Ideally, arkworks could migrate to crypto_bigint since that's more careful code, but that's likely a big task. And crypto_bigint::UInt: CanonicalDeserialize changes the dependency tree slightly.

burdges commented 2 months ago

I think most code uses BigInt via Fp, including afaik all scalars and points. I'd expect the story goes: You'd want const generics to make a BigInt slightly bigger, so instead BigUInt uses a Vec. It's odd naming though, so maybe some other history else happens BigUInt, and maybe BigUInt could be converted to work like BigInt.

Afaik, all those usages should all be ephemeral, meaning BigUInt should not appear in wire formats, and simply disabling BigUInt: CanonicalDeserialize works fine, but cannot test right now.

ggwpez commented 2 months ago

You found this fuzzing code we envision deploying? If unused, we could maybe deactivate BigUInt: CanonicalDeserialize using some feature?

I did not see it exposed directly in types that i wanted to use, but also did not check everything. Vec and VecDequeu have the same issue, but probably also not exposed i guess?
For now its not a real issue to me, but something to be aware of for developers i guess.