AdamISZ / aut-ct

Anonymous usage tokens from curve trees
18 stars 3 forks source link

Python bindings #16

Closed 1440000bytes closed 1 month ago

1440000bytes commented 3 months ago

I was thinking about creating python bindings for aut-ct so that it can be easily used in joinstr. Does it make sense or we have better alternatives?

1440000bytes commented 3 months ago

I tried creating python binding based on the instructions shared in https://github.com/PyO3/pyo3 README but getting this error:

error: Python functions cannot have generic const parameters
  --> src/autct.rs:41:5
   |
41 |     const L: usize,
   |     ^^^^^

error[E0425]: cannot find value __pyo3_get_function_get_curve_tree_with_proof in this scope
   --> src/autct.rs:153:20
    |
153 |     m.add_function(wrap_pyfunction!(get_curve_tree_with_proof, m)?)?;
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in this scope
    |
    = note: this error originates in the macro proc_macro_call which comes from the expansion of the macro wrap_pyfunction (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try rustc --explain E0425.
error: could not compile autct (bin "autct") due to 2 previous errors
💥 maturin failed
  Caused by: Failed to build a native library through cargo
  Caused by: Cargo build finished with "exit status: 101": env -u CARGO PYO3_ENVIRONMENT_SIGNATURE="cpython-3.10-64bit" PYO3_PYTHON="/media/test/extenstion/Downloads/aut-ct/myenv/bin/python" PYTHON_SYS_EXECUTABLE="/media/test/extenstion/Downloads/aut-ct/myenv/bin/python" "cargo" "rustc" "--message-format" "json-render-diagnostics" "--manifest-path" "/media/test/extenstion/Downloads/aut-ct/Cargo.toml" "--bin" "autct"

I just tried it for get_curve_tree_with_proof():

 #[pyfunction]
pub fn get_curve_tree_with_proof<
    const L: usize,
    F: PrimeField,
    P0: SWCurveConfig<BaseField = F> + Copy,
    P1: SWCurveConfig<BaseField = P0::ScalarField, ScalarField = P0::BaseField> + Copy,
>(
    depth: usize,
    generators_length_log_2: usize,
    pubkey_file_path: &str,
    our_pubkey: Affine<P0>,
) -> (R1CSProof<Affine<P0>>, R1CSProof<Affine<P1>>,
    SelectAndRerandomizePath<L, P0, P1>,
    P0::ScalarField,
    Affine<P0>, Affine<P0>, bool) {
    let mut rng = rand::thread_rng();
    let generators_length = 1 << generators_length_log_2;

    let sr_params =
        SelRerandParameters::<P0, P1>::new(generators_length, generators_length, &mut rng);

    let p0_transcript = Transcript::new(b"select_and_rerandomize");
    let mut p0_prover: Prover<_, Affine<P0>> =
        Prover::new(&sr_params.even_parameters.pc_gens, p0_transcript);

    let p1_transcript = Transcript::new(b"select_and_rerandomize");
    let mut p1_prover: Prover<_, Affine<P1>> =
        Prover::new(&sr_params.odd_parameters.pc_gens, p1_transcript);

    // these are called 'leaf commitments' and not 'leaves', but it's just
    // to emphasize that we are not committing to scalars, but using points (i.e. pubkeys)
    // as the commitments (i.e. pedersen commitments with zero randomness) at
    // the leaf level.
    let mut privkey_parity_flip: bool = false;
    let leaf_commitments = get_leaf_commitments::<F, P0>(pubkey_file_path);
    // derive the index where our pubkey is in the list:
    let mut key_index: i32; // we're guaranteed to overwrite or panic but the compiler insists.
    // the reason for 2 rounds of search is that BIP340 can output a different parity
    // compared to ark-ec 's compression algo.
    key_index = match leaf_commitments.iter().position(|&x| x  == our_pubkey) {
        None => -1,
        Some(ks) => ks.try_into().unwrap()
    };
    if key_index == -1 {
        key_index = match leaf_commitments.iter().position(|&x| x == -our_pubkey) {
            None => panic!("provided pubkey not found in the set"),
            Some(ks) => {
                privkey_parity_flip = true;
                ks.try_into().unwrap()
            }
        }
    };

    let (curve_tree, _) = get_curve_tree::<L, F, P0, P1>(
        pubkey_file_path, depth, &sr_params);
    assert_eq!(curve_tree.height(), depth);

    let (path_commitments, rand_scalar) =
    curve_tree.select_and_rerandomize_prover_gadget(
        key_index.try_into().unwrap(),
        &mut p0_prover,
        &mut p1_prover,
        &sr_params,
        &mut rng,
    );
    // as well as the randomness in the blinded commitment, we also need to use the same
    // blinding base:
    let b_blinding = sr_params.even_parameters.pc_gens.B_blinding;
    // The randomness for the PedDLEQ proof will have to be the randomness
    // used in the curve tree randomization, *plus* the randomness that was used
    // to convert P to a permissible point, upon initial insertion into the tree.
    let mut r_offset: P0::ScalarField = P0::ScalarField::zero();
    let lcindex: usize = key_index.try_into().unwrap();
    let mut p_prime: Affine<P0> = leaf_commitments[lcindex];
    // TODO: this is basically repeating what's already done in
    // sr_params creation, but I don't know how else to extract the number
    // of H bumps that were done (and we need to, see previous comment).
    while !sr_params.even_parameters.uh.is_permissible(p_prime) {
        p_prime = (p_prime + b_blinding).into();
        r_offset += P0::ScalarField::one();
    }
    // print the root of the curve tree.
    // TODO: how to allow the return value to be either
    // Affine<P0> or Affine<P1>? And as a consequence,
    // to let the code be correct for any depth.
    // And/or, is there not
    // a simpler way to extract the root of the tree
    // (which should be just .parent_commitment, but all methods
    // to extract this value seem to be private)
    let newpath = curve_tree.select_and_rerandomize_verification_commitments(
    path_commitments.clone());
    let root_is_odd = newpath.even_commitments.len() == newpath.odd_commitments.len();
    println!("Root is odd? {}", root_is_odd);
    let root: Affine<P0>;
    if !root_is_odd {
        root = *newpath.even_commitments.first().unwrap();
    }
    else {
        // derp, see above TODO
        panic!("Wrong root parity, should be even");
    }
    let p0_proof = p0_prover
        .prove(&sr_params.even_parameters.bp_gens)
        .unwrap();
    let p1_proof = p1_prover
        .prove(&sr_params.odd_parameters.bp_gens)
        .unwrap();
    let returned_rand = rand_scalar + r_offset;
    (p0_proof, p1_proof, path_commitments,
     returned_rand, b_blinding, root, privkey_parity_flip)
}

#[pymodule]
fn autct(py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(get_curve_tree_with_proof, m)?)?;
    Ok(())
}
AdamISZ commented 3 months ago

I was thinking about creating python bindings for aut-ct so that it can be easily used in joinstr. Does it make sense or we have better alternatives?

I guess this follows up from earlier conversations e.g. on delvingbitcoin, but the way I see it, it is one possibility for Sybil resistance functionality in coinjoin, but one with limitations, since it mainly prevents bursts of lots of requests, and not enforces significant cost on participation (so it more prevents the snooping attack of lots of requests to gather data from other participants by starting but not completing, it does not help nearly as much (I think) with the "N-1 participants are Sybils" problem which is the one that people first think of; it's more obvious).

If your wallet is taproot and you integrate into the user wallet, the process of creating proofs, then you have in theory removed most of the burden of using a proof. In Joinmarket, we had a similar thing with PoDLE; it's a custom cryptographic proof but the wallet can do it in the background. That still doesn't mean it's trivial. To give a sense of it, with Joinmarket we have the following problems to address:

Each utxo can only be used 3 times and must be age at least 5 blocks old and must be at least 20% of the intended coinjoin amount. What's more, we have to use a utxo in the same BIP32 account (mixdepth) and not one from a different account, because that would leak privacy info to the makers you're sending the proof to (note that aut-ct removes this issue). That logic has to be applied when starting a coinjoin as a taker, and if the wallet can't find a utxo that meets these requirements, there has to be backup logic to give the user other options.

That doesn't all apply here, but some of it might. You could use any (taproot) utxo, but you'd want a usage limit of the same utxo, anything from "each utxo can only be used once" (the simplest but maybe too restrictive?) to "only utxos greater than N blocks confirmed and greater than amount X and each one can be used Y times". Then you have to think if these are global, static rules or if not how are they agreed; and this affects how the utxoset files are set up. Finally an extra detail is making sure that the proving process does not take up too much compute time. The keyset files (the utxoset basically, within the given filter of X amount and N confirms) needn't be too expensive, but that's another difficult call - for the user to generate it themselves is a bunch of process and involving very large files, but if they just grab it from someone else there's obviously a trust issue - this is all discussed in docs in this repo. And the keyset files have to be refreshed at intervals (thinking about an "N confirms" rule).

Finally, python bindings - yes makes a lot of sense. I have briefly done this kind of thing years ago (though not from Rust of course), so I will try to help with the technical issues where I can.

I think there is still some non-trivial work to do on the project itself to make it better, for example I mentioned above "multi-usage of one utxo" but I didn't actually implement that yet.

AdamISZ commented 3 months ago

I tried creating python binding based on the instructions shared in https://github.com/PyO3/pyo3 README but getting this error:

error: Python functions cannot have generic const parameters

Indeed it does have a generic const parameter (I'm trying to remember exactly, but because the library being used (see here note here I'm using my fork of the authors' repo) uses a generic const for the curve tree branching factor, and these can't be specified at runtime, so I have to use a fixed list of options for L. I don't know what could be done to allow Python to use that, probably worth some research.

I'd also say that it may make more sense to bind to the run_prover function. The reason I say that is that get_curve_tree_with_proof is only part of the cryptographic processing here. To make a binding to run_prover you'll want to reproduce the AutCtConfig struct to pass it into the function (although only some of the fields of the struct are being used). That will mean all the internal crypto stuff gets handled inside the call and a proof will get written to file.

AdamISZ commented 3 months ago

I tried creating python binding based on the instructions shared in https://github.com/PyO3/pyo3 README but getting this error:

error: Python functions cannot have generic const parameters

OK another, perhaps(?) more helpful response:

You can replace the generic const parameter in the code (L) with a fixed value. i.e. remove the const L there and replace every L in the function call with 256. Note that 256 is the default, specified here. If you do that, the code will still run correctly in Rust so it might remove the problem, at the cost of sacrificing the possibility of the user to choose a different number, which to be honest is not very useful here - you'll want to fix a specific choice for L anyway.

But again I would probably be wanting to make a binding to run_prover instead.

AdamISZ commented 3 months ago

OK another, perhaps(?) more helpful response:

You can replace the generic const parameter in the code (L) with a fixed value <...>

I should have referenced this for context.

AdamISZ commented 2 months ago

So I've hacked around with this a bit @1440000bytes

Basically done the following:

But, haven't gone further, e.g. actually trying the binding to see if it even worked.

I guess it's perhaps a promising start but no more than that.

1440000bytes commented 2 months ago

Awesome. Thanks for making necessary changes. I will test this in the morning.

1440000bytes commented 2 months ago

I added this below run_prover():

#[pymodule]
fn aut_ct(py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<AutctConfig>()?;
    m.add_function(wrap_pyfunction!(run_prover, m)?)?;
    Ok(())
}

Wheel was created and installed successfully:

(myenv) $ maturin build

📦 Built wheel to /media/test/extenstion/Downloads/curve-tree/aut-ct/target/wheels/autct-0.0.1-cp310-cp310-manylinux_2_34_x86_64.whl

(myenv) $ pip install /media/test/extenstion/Downloads/curve-tree/aut-ct/target/wheels/autct-0.0.1-cp310-cp310-manylinux_2_34_x86_64.whl

Processing ./target/wheels/autct-0.0.1-cp310-cp310-manylinux_2_34_x86_64.whl
Installing collected packages: autct
Successfully installed autct-0.0.1

(myenv) $ pip list
Package    Version
---------- --------
autct      0.0.1
patchelf   0.17.2.1
pip        22.0.2
setuptools 59.6.0

However, I am unable to use it in python:

(myenv) $ python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import aut_ct
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'aut_ct'
>>> import autct
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'autct'
AdamISZ commented 2 months ago

Yes, I've been investigating this. I'll add some detail later and probably update the branch, but there are in fact several things you need to add to make this work:

Basically adding this in lib.rs :

use pyo3::prelude::*;

#[pyfunction]
fn run_prover_wrap() -> Result<(), CustomError> {
    let autctcfg = config::AutctConfig::build().unwrap();
    run_prover(autctcfg)
}
/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn autctbind(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(run_prover_wrap, m)?)?;
    Ok(())
}

and noting this is needed in Cargo.toml:

[lib]
# The name of the native library. This is the name which will be used in Python to import the
# library (i.e. `import string_sum`). If you change this, you must also change the name of the
# `#[pymodule]` in `src/lib.rs`.
name = "autctbind"

# "cdylib" is necessary to produce a shared library for Python to import from.
crate-type = ["cdylib"]

[dependencies]
pyo3 = "0.21.1"

a lot of this is explained here.

This is in itself a pretty imperfect solution since you'll still need to use autct config files and edit private key, proof file location etc. exactly as you would if you were running the autct rust binary. (otherwise I'll need more code to make a custom struct wrapping AutCtConfig that conforms to the strict requirements of pyo3 (again see above docs).

It's honestly turning out to be more intrusive than I hoped. I originally intended to focus on an RPC-API to the binary (that's what's already done for the verification call). I don't like that writing code for bindings specifically for Python, seems to require a lot of extra or even changed code. I'm still doing it though because it's interesting; it would just be far better if there was something more generic (is wasm possible?).

AdamISZ commented 2 months ago

As of 8bb7e2fac13c34af58b0a651c3c5b1252803e72a the proving function can be run successfully via the binding. See the note to that commit for the syntax.

For details on what should be in the config file, see the other notes, I recommend doing the worked example (which I fixed up last week), plus a few other considerations:

The proving is slow. Do not even think about using debug build (which will be default with maturin develop), but also the binding will doubtless slow it down a bit more, and there is work to do that I guess will shave off maybe 70% of the time cost, see #10 for some detail on that.

As a result of the above you can use the small (100 keys) test file in testdata/; there is also a much more realistic 330K key example but it'll be a minute or two to run that until #10 is resolved.

As before, this is all a bit intrusive, not sure how I can improve it. But I should probably go back to #10 and also improving the RPC interface, amongst other things, if we want this tool to actually be useful at any point :)

1440000bytes commented 2 months ago

It's honestly turning out to be more intrusive than I hoped. I originally intended to focus on an RPC-API to the binary (that's what's already done for the verification call). I don't like that writing code for bindings specifically for Python, seems to require a lot of extra or even changed code. I'm still doing it though because it's interesting; it would just be far better if there was something more generic (is wasm possible?).

RPC-API sounds good. Wasm wont make a lot of difference compared to bindings for python. It could be useful for web apps.

As of https://github.com/AdamISZ/aut-ct/commit/8bb7e2fac13c34af58b0a651c3c5b1252803e72a the proving function can be run successfully via the binding. See the note to that commit for the syntax.

Thank you. I am really excited to be involved in the first bitcoin project that will experiment with aut-ct for an important use case.

AdamISZ commented 2 months ago

RPC-API sounds good.

How did you envisage it working? The reason I went to an RPC call for the verify straight away is for the obvious reason that most use cases would involve the verifier being remote; for the proving, since it's something by nature that the user must do "themselves", I wasn't thinking too much about an RPC call; you'd have to run the Rust code yourself anyway, in some manner. Do you think it's useful to distribute a binary for autct and a proving RPC call? It seems unlikely to be worth it, since even in e.g. Python you can just call the autct binary under the hood, no?

1440000bytes commented 2 months ago

Do you think it's useful to distribute a binary for autct and a proving RPC call?

Yes.

User is expected to run a daemon and make POST requests using python or other languages for different RPC calls. This works better on different environments compared to subprocess.run()

AdamISZ commented 2 months ago

Made a new branch python-binding.

Since I addressed #10 the process of proving has changed: there is a pre-processing step which I added to the binding, called convertkeys, so there is a new method run_convertkeys_wrap() in addition to what was before run_prover_wrap() (terrible naming, sorry). The latter (run_prover) is now down to about 17s for 330k keys and the dominant cost is linear (discussed in #11 , we can probably improve it), so if you use time and value filters for keysets that bring the total number of pubkeys down to say 50K, I would expect you to only need about 5-7s for running the prover (assuming sort of normal commodity laptop hardware).

(As a reminder the main purpose of this project is that verification is very quick: typically 50-100ms. We care less about the prover, but it does matter that it's not crazy slow...).

The preliminary step convertkeys was how I addressed #10 . Since this is fairly slow it's better done as a preprocessing step; it's better thought of as part of the convoluted process explained in this doc; specifically, it's an extra step at the end. So it goes:

dumptxoutset -> utxo_to_sqlite -> python script to filter on size and time -> raw pubkey set -> convert-to-permissible

The penultimate step results in a .aks file. The last step makes a .aks.p file (p for "permissible") and the prover uses the .aks.p file as input.

How to use: as per commit comment:

maturin develop --release
python
>>> import autctbind
followed by running two possible methods:
>>> autctbind.run_convertkeys_wrap()
>>> autctbind.run_prover_wrap()

with the first conversion method as in the README, and neither having arguments (the config read from the ./config/autct/default-config.toml file, so for now, to choose an input keyset file or private key file, or output proof file, change it there. Again, details are in README worked example etc.).

This retains the fixed L property (L=1024) from the fixed-L branch.

The --release argument for maturin is very important, since the code is way to slow in debug build. The overhead of the python binding is not really noticeable (as is typical with bindings, of course, that's the point of them :) ).

AdamISZ commented 2 months ago

It occurs to me I should give you a heads up on what I'm doing at the moment, since it's relevant:

after thinking about it a while I decided the most rational thing to do would be to have everything be accessible via RPC calls, and those methods would be; prove, verify and newkeys, with the binary itself runnable as a daemon with using the method serve.

I think this rationalization works the best, since all of the upfront cost is borne when you start the daemon, and you can leave it running in the background. This means that proofs, private keys and public keys are all accessed via files (with the work to encrypt the private key file not done yet but I guess that'll be fine), and this means clients can just send RPC requests and not have to worry about doing cryptographic operations etc., and the curve tree building upfront means that proof requests will always be reasonably fast (in line with above conversation, I removed the idea of "convertkeys" as a separate operation, as that's just annoying for the client/user of the API).

Development in this branch (I will merge into master at some point): https://github.com/AdamISZ/aut-ct/tree/no-fixed-L-daemon

AdamISZ commented 2 months ago

Let's see if a nostr link works:

https://njump.me/nevent1qvzqqqqqqypzqe6msnl8tcsk4w28capcaegeefmh2dmdmuza4ha6vfut6qfwr4egqywhwumn8ghj7mn0wd68ytnzd96xxmmfdejhytnnda3kjctv9uq32amnwvaz7tmjv4kxz7fwv3sk6atn9e5k7tcqyql8llcz2v6hmzddufq9ml48hugnv8xndrsexktmwxay0wsds97qkmmq093

.. so, I think this setup is the most likely to be effective. It makes programming against "prove and verify a taproot utxo" in Python way easier, as there are basically no dependencies, just pip install autctapi is enough. And the thing about a server or daemon running autct in the background is, it removes all the delays from actually using it (so proving takes seconds and not like 2 minutes).

Of course a binding is still entirely a viable concept as we discussed upthread.

1440000bytes commented 2 months ago

Thanks. I will use this in the next version of electrum plugin for joinstr and share code, feedback etc. after testing.

AdamISZ commented 1 month ago

Since the project took the other direction (RPC to a server for all the operations, and a python lib, as an example, using that API), I'm closing this, not as "not planned" so much as "there's nothing stopping someone (including me) doing this in future, using the research in this thread as a starting point", but I'm not planning it.

Feel free to reopen if needed.