Open elcritch opened 1 month ago
Notes on the circom-compat prover and Nim / Rust data ownership needed to do the threading properly:
nim-circom-compat
nim-circom-compat
uses a C API exposed in a small Rust wrapper library circom-compat-fficircom-compat-ffi
wraps the Rust library circom-compat, which is the core Circom / Ark libraryNotes on CircomBn254Cfg:
TODO: figure out a simple way to make CircomBn254Cfg for each worker thread?
CircomBn254Cfg
which it'd check if nil, and if so use the args to create a CircomBn254Cfg
Next handling the CircomCompatCtx
is relatively simple:
builder.push_input
through the C APIpush_input_xx
copies each bigint into memory owned by the CircomCompatCtx
Rust pointerto_string
which I believe creates a copy to the original stringbuilder.push_input
copies the BigInt's into the circom-compat's builder
which holds the data in a HashMap<String, Vec<BigInt>>
CircomCompatCtx
is manually created and released on the Nim side so it should be easy to create it on the Chronos thread and then free it once the work is done on the taskpool worker threadUpdates: Adding a thread using taskpools results in memory corruption and crashes in either the Rust library code or on the Nim side after calling into the Rust library.
I've tried lots of variations on the problem. Initially my thinking was that there was something fairly simple like using sharing a piece of mutable memory between threads. However, I'm fairly competent I've tracked down all the places on the inputs where this occurs.
Things I've tried which haven't resolved issue:
circom-compat-ffi
to create a new rng
for each threadCircomBn254Cfg
on the main threadCircomBn254Cfg
CircomBn254Cfg
from circuit files (wasm, zkey, etc)circom-compat-ffi
to explicitly clone builder keys using to_owned
CircomBn254Cfg
or CircomCompatCtx
to see if Nim side is calling something wrongThe Nim side appears to be working fine running the builder
and creating the CircomBn254Cfg
or CircomCompatCtx
objects, however, as soon as prove
is called it begins corrupting memory . Printing Nim inputs to the builder shows corruption in the builder keys, despite using to_owned
to ensure the Rust will make it's own copy of the strings.
Also, the proof returns appear to be working fine as are sending the arguments to the taskpool workers. Even running only a single thread worker which would prevent race conditions results in corrupted memory.
My current best guess is that something in the CircomCompat or Wasmer libraries assume they're running on the main thread, or have a thread local resource that is instantiated internally.
I'm going to setup a docker image so I can run valgrind on my mac and try to see what's going on.
Example parameter corruption below:
TASK: task: proof: success((a: (x: [252, 125, 205, 30, 91, 169, 208, 144, 232, 29, 242, 124, 78, 116, 186, 167, 118, 99, 221, 210, 215, 182, 33, 41, 207, 198, 113, 40, 80, 145, 99, 18], y: [173, 31, 56, 59, 77, 216, 231, 124, 23, 188, 79, 117, 200, 10, 199, 13, 116, 69, 134, 172, 56, 116, 225, 203, 14, 136, 127, 175, 192, 153, 96, 35]), b: (x: [[94, 109, 223, 96, 64, 52, 178, 201, 60, 203, 105, 119, 66, 155, 205, 209, 8, 234, 163, 208, 173, 164, 91, 83, 220, 48, 137, 72, 79, 234, 199, 29], [250, 128, 38, 89, 24, 148, 228, 232, 88, 216, 131, 10, 178, 203, 128, 242, 4, 210, 81, 200, 180, 31, 163, 127, 214, 163, 178, 66, 67, 50, 63, 45]], y: [[194, 156, 184, 152, 149, 70, 210, 6, 167, 99, 182, 182, 40, 60, 98, 158, 117, 67, 86, 45, 125, 183, 118, 193, 254, 255, 4, 210, 31, 119, 169, 6], [81, 217, 255, 215, 196, 250, 117, 10, 119, 9, 141, 91, 234, 103, 221, 104, 135, 113, 136, 254, 9, 91, 56, 152, 110, 99, 15, 199, 245, 82, 124, 40]]), c: (x: [55, 23, 205, 200, 58, 113, 165, 31, 166, 57, 161, 90, 103, 18, 239, 162, 36, 18, 25, 165, 218, 114, 234, 182, 137, 191, 139, 72, 65, 74, 51, 34], y: [108, 100, 140, 89, 122, 70, 78, 164, 197, 68, 107, 10, 252, 80, 14, 236, 73, 215, 139, 193, 93, 129, 50, 85, 108, 117, 18, 170, 79, 172, 153, 10])))
TASK: task: params POST: (slotDepth: 32, datasetDepth: 8, blkDepth: 5, cellElms: 67, numSamples: 5, r1csPath: "tests/circuits/fixtures/proof_main.r1cs", wasmPath: "tests/circuits/fixtures/proof_main.wasm", zkeyPath: "")
TASK: task:
TASK: task: params: 0x114f0f380
TASK: task: params: (slotDepth: 32, datasetDepth: 8, blkDepth: 5, cellElms: 67, numSamples: 5, r1csPath: "tests/circuits/fixtures/proof_main.r1cs", wasmPath: "tests/circuits/fixtures/proof_main.wasm", zkeyPath: "")
TASK: task: -2001899808422488719
TASK: task spawn: params: 0x104c0e5b0
PROVE: 17
TASK: task spawn: params: 0x104c0edd0
PROVE: 18
TASK: task spawn: params: 0x104d37790
PROVE: 19
TASK: task spawn: params: 0x104c42ec0
PROVE: 20
TASK: task spawn: params: 0x104dda650
PROVE: 21
TASK: task spawn: params: 0x104c42a60
PROVE: 22
TASK: task spawn: params: 0x104c9bce0
PROVE: 23
TASK: task: proof: success((a: (x: [81, 35, 236, 68, 189, 99, 225, 10, 196, 226, 243, 68, 50, 229, 30, 152, 240, 87, 72, 199, 57, 148, 7, 211, 54, 247, 46, 101, 246, 118, 200, 42], y: [110, 79, 55, 98, 72, 199, 47, 7, 192, 80, 38, 5, 219, 105, 202, 90, 17, 243, 80, 71, 209, 92, 132, 24, 102, 191, 198, 233, 72, 220, 203, 40]), b: (x: [[228, 140, 96, 208, 2, 19, 115, 179, 216, 137, 145, 45, 135, 211, 132, 61, 237, 36, 11, 213, 234, 52, 143, 255, 111, 204, 50, 10, 76, 208, 244, 38], [99, 220, 252, 121, 17, 173, 194, 225, 147, 17, 36, 174, 113, 93, 227, 65, 147, 197, 162, 64, 105, 66, 150, 147, 132, 239, 81, 95, 24, 132, 126, 35]], y: [[82, 132, 204, 152, 211, 196, 92, 195, 40, 252, 126, 184, 31, 50, 41, 169, 69, 240, 230, 19, 95, 96, 114, 95, 41, 207, 78, 137, 21, 202, 152, 3], [153, 197, 8, 14, 251, 110, 63, 127, 55, 161, 111, 251, 170, 19, 163, 79, 113, 103, 209, 110, 35, 245, 203, 20, 84, 47, 247, 117, 184, 102, 251, 25]]), c: (x: [219, 122, 190, 26, 79, 118, 204, 95, 179, 74, 2, 48, 128, 94, 191, 208, 113, 163, 252, 107, 148, 228, 24, 230, 39, 188, 160, 187, 125, 197, 176, 33], y: [237, 41, 43, 42, 44, 61, 81, 206, 218, 6, 61, 247, 33, 229, 6, 105, 12, 216, 125, 145, 128, 144, 243, 200, 103, 100, 13, 66, 19, 154, 157, 27])))
TASK: task: params POST: (slotDepth: 32, datasetDepth: 8, blkDepth: 5, cellElms: 67, numSamples: 5, r1csPath: "tests/circuits/fixtures/proof_main.r1cs", wasmPath: "tests/circuits/fixtures/proof_main.wasm", zkeyPath: "")
TASK: task:
TASK: task: params: 0x114f0f4c0
TASK: task: params: (slotDepth: 32, datasetDepth: 8, blkDepth: 5, cellElms: 67, numSamples: 5, r1csPath: "\x12 ��B��\x1C\x14���șo�$\'�A�d��L���\exR�U", wasmPath: "\x12 R߆;��AKV,:�D�\x13=\v���\e\x11��Q���\x15��", zkeyPath: "")
The prover currently blocks the main Chronos thread. It should be scheduled on thread similar to the erasure coding as done in https://github.com/codex-storage/nim-codex/pull/716.