bcmyers / argonautica

Idiomatic argon2 password hashing for several languages
Apache License 2.0
146 stars 29 forks source link

`Error: C code reported a "Threading failure" error` #11

Open dpc opened 5 years ago

dpc commented 5 years ago

I'm trying the hash something on a laptop, that worked on the desktop. One obvious difference is - desktop generated it and it has more cores, so lanes etc. are 8, while the laptop just 4?

dpc commented 5 years ago

Yeap. When I change 8 to 2 it suddenly works again. And I think it used to work in the past here as well.

dpc commented 5 years ago

Unfortunately the number of lanes do change the final hash, so I can't just use lower value.

dpc commented 5 years ago

I've added .configure_threads(1) and now it works. But it is really not great that it crashes when number of lanes used is higher than number of detected cores(?).

bcmyers commented 5 years ago

Thanks dpc, I'll look into this. Could you paste the full configuration (and/or code) that you were using that was problematic so it's easier for me to see what the issue is?

dpc commented 5 years ago

The code in question: https://github.com/dpc/crev/blob/5a956deeddba6ac682c13e20533a0cbb2a7ce97a/crev-lib/src/id.rs#L139

> cat ~/.config/crev/ids/FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE.yaml 
---
version: -1
url: "https://github.com/dpc/crev-proofs"
public-key: FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE
sealed-secret-key: <redacted>
seal-nonce: <redacted>
pass:
  version: 19
  variant: argon2id
  iterations: 192
  memory-size: 4096
  lanes: 8
  salt: <redacted>

Basically - I was setting lanes on the hasher to 8, but my machine only has 2 real CPUs so something was crashing.

The workaround that works for me and still gives correct results is:

diff --git a/crev-lib/src/id.rs b/crev-lib/src/id.rs
index be9d2d8..bf75094 100644
--- a/crev-lib/src/id.rs
+++ b/crev-lib/src/id.rs
@@ -142,6 +142,7 @@ impl LockedId {
                 .configure_version(argonautica::config::Version::from_u32(pass.version)?)
                 .configure_iterations(pass.iterations)
                 .configure_variant(std::str::FromStr::from_str(&pass.variant)?)
+                .configure_threads(1)
                 .with_salt(&pass.salt)
                 .configure_hash_len(64)
                 .opt_out_of_secret_key(true);

which basically means i have to manually limit number of threads.

What I would expect: if lanes > cpu_cores, just use cpu_cores of threads.

bcmyers commented 5 years ago

Cool. Will look into it in the coming days. Again, thanks for the issue.

bcmyers commented 5 years ago

So I spent a few minutes investigating this today. I don't think I can see the issue yet. Perhaps you could look over what I've written below and let me know what I'm not understanding.

You're right that the number of lanes should (and does) affect the resulting hash. The number of threads, however, should not (and does not) affect the resulting hash. Moreover, you can hash in all of the following scenarios (none of them provoke an error or panic):

So I'm not sure what's crashing for you. Here's the test case I was playing around with to prove to myself that all of the above is correct.

It basically says: "Let's set the number of threads to 1 and then figure out what the resulting hash is when we vary the number of lanes from 1 to 32. Now let's repeat this, but also varying the number of threads from 2 to 32. No matter how many threads we use, we should always get the same result as we did when using just one thread (which we do)."

extern crate argonautica;
extern crate failure;

use std::collections::HashMap;

use argonautica::Hasher;
use failure::Error;

#[test]
fn test_issue_11() -> Result<(), Error> {
    const SALT: &str = "somesalt";
    const HASH_LEN: u32 = 64;
    const MEMORY_SIZE: u32 = 256;
    const PASSWORD: &str = "P@ssw0rd";

    let expected = {
        let mut map = HashMap::new();
        for lanes in 1..=32 {
            let mut hasher = Hasher::default();
            hasher
                .with_salt(SALT)
                .configure_hash_len(HASH_LEN)
                .configure_lanes(lanes)
                .configure_memory_size(MEMORY_SIZE)
                .configure_threads(1)
                .opt_out_of_secret_key(true);
            let hash_bytes = hasher
                .with_password(PASSWORD)
                .hash_raw()?
                .raw_hash_bytes()
                .to_vec();
            map.insert(lanes, hash_bytes);
        }
        println!("Finished getting 1 thread baseline.");
        map
    };

    for threads in 2..=32 {
        let mut actual = HashMap::new();
        for lanes in 1..=32 {
            let mut hasher = Hasher::default();
            hasher
                .with_salt(SALT)
                .configure_hash_len(HASH_LEN)
                .configure_lanes(lanes)
                .configure_memory_size(MEMORY_SIZE)
                .configure_threads(threads)
                .opt_out_of_secret_key(true);

            let hash_bytes = hasher
                .with_password(PASSWORD)
                .hash_raw()?
                .raw_hash_bytes()
                .to_vec();
            actual.insert(lanes, hash_bytes);
        }
        assert_eq!(expected, actual);
        println!(
            "Finished testing {} threads. Results are the same as 1 thread baseline",
            threads
        );
    }

    Ok(())
}
dpc commented 5 years ago

The problem is that about lanes to thread relationship. It seems to be about threads to actual cpu cores. Setting lanes just makes number of threads go up too.

I've modified your test to:

#[test]
fn test_issue_11() -> Result<()> {
    const SALT: &str = "somesalt";
    const HASH_LEN: u32 = 64;
    const MEMORY_SIZE: u32 = 256;
    const PASSWORD: &str = "P@ssw0rd";

    let expected = {
        let mut map = HashMap::new();
        for lanes in 1..=32 {
            let mut hasher = Hasher::default();
            hasher
                .with_salt(SALT)
                .configure_hash_len(HASH_LEN)
                .configure_lanes(lanes)
                .configure_memory_size(MEMORY_SIZE)
                .opt_out_of_secret_key(true);
            let hash_bytes = hasher
                .with_password(PASSWORD)
                .hash_raw()?
                .raw_hash_bytes()
                .to_vec();
            map.insert(lanes, hash_bytes);
        }
        println!("Finished getting 1 thread baseline.");
        map
    };

    for threads in 2..=32 {
        let mut actual = HashMap::new();
        for lanes in 1..=32 {
            let mut hasher = Hasher::default();
            hasher
                .with_salt(SALT)
                .configure_hash_len(HASH_LEN)
                .configure_lanes(lanes)
                .configure_memory_size(MEMORY_SIZE)
                .configure_threads(threads)
                .opt_out_of_secret_key(true);

            let hash_bytes = hasher
                .with_password(PASSWORD)
                .hash_raw()?
                .raw_hash_bytes()
                .to_vec();
            actual.insert(lanes, hash_bytes);
        }
        assert_eq!(expected, actual);
        println!(
            "Finished testing {} threads. Results are the same as 1 thread baseline",
            threads
        );
    }

    Ok(())
}

and it is failing with:

---- tests::test_issue_11 stdout ----
Error: argonautica::Error { kind: argonautica::ErrorKind::ThreadError, display: "C code reported a \"Threading failure\" error" }
thread 'tests::test_issue_11' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', src/libtest/lib.rs:335:5

And, what's surprising I'm now getting:

error: process didn't exit successfully: `/home/dpc/lab/crev/target/debug/deps/crev_lib-db8030f3f9fd0e08` (signal: 11, SIGSEGV: invalid memory reference)

sometimes.

Let me get some gdb trace one that.

dpc commented 5 years ago

Job 3, 'gdb /home/dpc/lab/crev/target/d…' has stopped :/

dpc commented 5 years ago

SIGSEGV never happens without this new test.

dpc commented 5 years ago

Even when I only execute this one new test it sometimes segfaults.

    Finished release [optimized] target(s) in 0.26s
     Running target/release/deps/cargo_crev-211cbdc418c06591

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/crev_common-e7a2848ce3560c3d

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/crev_data-f540c8f164a93f4a

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out

     Running target/release/deps/crev_lib-f5638e2bc69101de

running 1 test
error: process didn't exit successfully: `/home/dpc/lab/crev/target/release/deps/crev_lib-f5638e2bc69101de issue` (signal: 11, SIGSEGV: invalid memory reference)

I annotated it with println to see when exactly it fails.

#[test]
fn test_issue_11() -> Result<()> {
    const SALT: &str = "somesalt";
    const HASH_LEN: u32 = 64;
    const MEMORY_SIZE: u32 = 256;
    const PASSWORD: &str = "P@ssw0rd";

    let expected = {
        let mut map = HashMap::new();
        for lanes in 1..=32 {
            let mut hasher = Hasher::default();
            println!("L{}", lanes);
            hasher
                .with_salt(SALT)
                .configure_hash_len(HASH_LEN)
                .configure_lanes(lanes)
                .configure_memory_size(MEMORY_SIZE)
                .opt_out_of_secret_key(true);
            let hash_bytes = hasher
                .with_password(PASSWORD)
                .hash_raw()?
                .raw_hash_bytes()
                .to_vec();
            map.insert(lanes, hash_bytes);
        }
        println!("Finished getting 1 thread baseline.");
        map
    };

    for threads in 2..=32 {
        let mut actual = HashMap::new();
        for lanes in 1..=32 {
            println!("T{} L{}", threads, lanes);
            let mut hasher = Hasher::default();
            hasher
                .with_salt(SALT)
                .configure_hash_len(HASH_LEN)
                .configure_lanes(lanes)
                .configure_memory_size(MEMORY_SIZE)
                .configure_threads(threads)
                .opt_out_of_secret_key(true);

            let hash_bytes = hasher
                .with_password(PASSWORD)
                .hash_raw()?
                .raw_hash_bytes()
                .to_vec();
            actual.insert(lanes, hash_bytes);
        }
        assert_eq!(expected, actual);
        println!(
            "Finished testing {} threads. Results are the same as 1 thread baseline",
            threads
        );
    }

    Ok(())
}

When it doesn't segfaults it prints L1 and L2 every time. When it does there's no output, but that's probably due to cargo test buffering.

I also switched to --release, but it doesn't seem to change anything. Except it's faster.

dpc commented 5 years ago

Oh. Actually in --release when I run in gdb it sometimes prints L3 and even L4.

dpc commented 5 years ago

Damn, rust made my segfault debugging skills so rusty...

warning: core file may not match specified executable file.
[New LWP 19258]
[New LWP 18988]
[New LWP 18987]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/dpc/lab/crev/target/release/deps/crev_lib-6c3235cfbd55d7d4 issue'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056125b630f36 in fill_segment (instance=0x7f326a7b11c0, position=...) at phc-winner-argon2/src/ref.c:159
159                 pseudo_rand = instance->memory[prev_offset].v[0];
[Current thread is 1 (Thread 0x7f326a5b2700 (LWP 19258))]
(gdb) bt
#0  0x000056125b630f36 in fill_segment (instance=0x7f326a7b11c0, position=...) at phc-winner-argon2/src/ref.c:159
#1  0x000056125b62ea65 in fill_segment_thr (thread_data=<optimized out>) at phc-winner-argon2/src/core.c:275
#2  0x00007f326ab53164 in start_thread (arg=<optimized out>) at pthread_create.c:486
#3  0x00007f326aa5fdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 
dpc commented 5 years ago
(gdb) list
154                 if (i % ARGON2_ADDRESSES_IN_BLOCK == 0) {
155                     next_addresses(&address_block, &input_block, &zero_block);
156                 }
157                 pseudo_rand = address_block.v[i % ARGON2_ADDRESSES_IN_BLOCK];
158             } else {
159                 pseudo_rand = instance->memory[prev_offset].v[0];
160             }
161
162             /* 1.2.2 Computing the lane of the reference block */
163             ref_lane = ((pseudo_rand >> 32)) % instance->lanes;
(gdb) 
(gdb) p instance
$1 = (const argon2_instance_t *) 0x7f326a7b11c0
(gdb) p instance->memory
$2 = (block *) 0x7f326a7b12e0
(gdb) p prev_offset
$3 = <optimized out>
(gdb) p instance->memory[prev_offset]
value has been optimized out
(gdb) p instance->memory[prev_offset].v
value has been optimized out
(gdb) p instance->memory[prev_offset].v[0]
value has been optimized out
dpc commented 5 years ago
(gdb) p instance->lanes
$4 = 0
(gdb) p *instance
$5 = {memory = 0x7f326a7b12e0, version = 1786451528, passes = 32562, memory_blocks = 1786451360, segment_length = 32562, lane_length = 2, 
  lanes = 0, threads = 0, type = Argon2_d, print_internals = 21, context_ptr = 0x4}
(gdb) 
bcmyers commented 5 years ago

Thanks for all of that! My apologies for not having found the time these past couple days to dig back into this, but just wanted to let you know that it's on my radar and I hope to tackle it in the coming days. Again, thanks for your help!