arkworks-rs / snark

Interfaces for Relations and SNARKs for these relations
https://www.arkworks.rs
Apache License 2.0
768 stars 204 forks source link

program crash when the number of constraints is too large #324

Closed brucechin closed 3 years ago

brucechin commented 3 years ago

Hello. I use LibZEXE to build a large zk project of more than 10 million constraints. However, even on a 256GB memory machine, the program crashes during create_random_proof(). Emperically, the largest circuit I can create proof for is around 5 million constraints. I feel like libzexe is not very memory-efficient. Do you have any suggestions on saving memory when building and creating proof for extremely large circuits? Thanks!

weikengchen commented 3 years ago

The library has not been specially optimized for memory efficiency, and many proof systems are heavy by itself.

I would suggest checking whether the crash happens during generating constraints or during the computation of the Groth16 polynomials.

Another thing to check is to make sure that tracing is not turned on.

I would suggest revising whether---during generate_constraints---too many data structures have been allocated. One thing you can try is to allocate more virtual memory. This would be slow but may circumvent the memory barrier.

Pratyush commented 3 years ago

Yeah, it would be helpful to know more about where the program is crashing, and what kind of constraint system you're using; to debug further you can compile with --features print-trace

brucechin commented 3 years ago

It crashes when generating CRS. For example, if the total number of constraints is 3 million. During generating constraints, it consumes 6GB memory. During CRS generation, it consumes another 80GB. During creating proof, it consumes another 22GB.

What confuses me is for a circuit of 17 million constraints. the CRS generation consumes more than 800GB which does not grow linearly with the number of constraints. I guess it's Nlog(N)?

    let param =
        generate_random_parameters::<algebra::Bls12_381, _, _>(full_circuit.clone(), &mut rng)
            .unwrap();

I allocate extra 500GB swap memory and test it again. Will update the results soon.

Pratyush commented 3 years ago

Sorry, what do you mean by "300w"? Also, what curve are you using?

I presume this is for Groth16?

@kobigurk, you have experience with optimizing the memory consumption of Groth16, right? Does this trigger any painful memories, lol?

Pratyush commented 3 years ago

Hi @brucechin, does https://github.com/arkworks-rs/groth16/pull/9 help with your use case?

brucechin commented 3 years ago

Sorry it's 3 million and 17 million. I will read this issue and reply back soon. Yes, I use groth16

Pratyush commented 3 years ago

Oh hmm if your code is under zexe, then you'll need to upgrade to use arkworks as outlined in this PR: https://github.com/arkworks-rs/snark/pull/320

brucechin commented 3 years ago

seems that i need modify many code related to CRS generation and create proof. still working on it

kobigurk commented 3 years ago

Sorry, what do you mean by "300w"? Also, what curve are you using?

I presume this is for Groth16?

@kobigurk, you have experience with optimizing the memory consumption of Groth16, right? Does this trigger any painful memories, lol?

To be fair, I haven't done a lot :) That said, 17million constraints and 800GB does sound like way too much. I'm currently performing tests of circuits with 128mil constraints on BW6 with 400GB RAM + 400GB swap and it works great - setup and proving.

brucechin commented 3 years ago

Maybe my implementation is not that optimal:(

brucechin commented 3 years ago

Sadly, @Pratyush I have tried your new commits which do not have much effects. I have checked that https://github.com/arkworks-rs/groth16/blob/27e99a3c7ef1ce7bbcdc0dc567417ac05cb600a0/src/generator.rs#L57 this inline_lcs() consumes most of the memory during CRS generation.

The number is the same : ~800GB memory during CRS generation for 17million constraints.

weikengchen commented 3 years ago

The current inlining copies the entire LC map again, which is possibly something we could cut.

As a side note, the storage overhead is not just linear to the number of constraints, but in fact the "complexity" of the constraint system. In this case, this seems to imply that the constraint system has a very high density (each constraint is complicated).

Pratyush commented 3 years ago

I’m working on a PR to remove LCs once they have been inlined into the locations that use them, which should help with the memory consumption.

On Fri, Nov 27, 2020 at 9:53 AM Weikeng Chen notifications@github.com wrote:

The current inlining copies the entire LC map again, which is possibly something we could cut.

As a side note, the storage overhead is not just linear to the number of constraints, but in fact the "complexity" of the constraint system. In this case, this seems to imply that the constraint system has a very high density (each constraint is complicated).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arkworks-rs/snark/issues/324#issuecomment-734937626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYSJ6XQKQX55EGIU7KLXQ3SR7RRBANCNFSM4UDBUM4A .

weikengchen commented 3 years ago

I would be concerned about how much this PR would help. Since the inlined LC is "more complicated".

Pratyush commented 3 years ago

Hmm I think it should help, if you have a long chain of LCs, then you can delete everything but the last one, which should halve memory requirements (and also reduce fragmentation)

Pratyush commented 3 years ago

@brucechin, does the latest commit help? (make sure to run cargo update)

brucechin commented 3 years ago

Good news. It really works! And now CRS generation consumes ~90% less memory now. Another suggestion is to parallelize some parts to further speedup. Thanks for your help!

Pratyush commented 3 years ago

Amazing! Parallelization is already enabled by default; let me know if you aren't seeing good core utilization.