bkerler / opencl_brute

MD5,SHA1,SHA256,SHA512,HMAC,PBKDF2,SCrypt Bruteforcing tools using OpenCL (GPU, yay!) and Python
MIT License
163 stars 44 forks source link

SCrypt Nrp values #12

Closed avsync closed 3 years ago

avsync commented 4 years ago

I'm a bit stuck an hope you wouldn't mind assisting, I struggled for quite a while trying to figure out why I was getting different scrypt hashes for cpu vs. gpu. I finally realized that it's related to the << operators.

These two will give me a matching output: CPU: scrypt.hash(passphrase_bytes, salt, 1 << 15, 1 << 3, 1 << 1, 32) OpenCL: scrypt_test(opencl_algos, passwordlist, 15, 3, 1, 0x20, salt)

How could I specify the CPU Nrp values for the GPU code here? CPU: scrypt.hash(passphrase_bytes, salt, 1 << 15, 3, 1, 32) OpenCL: scrypt_test(opencl_algos, passwordlist, ??, ?, ?, 0x20, salt)

avsync commented 4 years ago

I've managed to figure something out...

This is the actual CPU based code I'm trying to replicate. scrypt.hash(passphrase_bytes, salt,1 << 18, 8, 1, 32)

In order for the OpenCL code to output the same hash...

I had to change "#define N 15" in the sCrypt.cl file to N 18. Not sure why, but passing 18 as the N parameter from python doesn't have the same effect.

Then by changing the p value to 0, I was able to output the same hash as the cpu code. scrypt_test(opencl_algos, passwordlist,18, 8, 0, 32, salt)

The performance is terrible though, about 10s per hash on a GTX1070.

avsync commented 4 years ago

Running in a VM on ubuntu on the Intel OpenCL platform is significantly better. Down to 0.4s per hash, still far too slow to be usable for brute though. The fact that the nvidia platform on windows is so much slower makes me think driver issues maybe?

avsync commented 4 years ago

Still not having much luck making use of the GTX1070. Code to bench: https://pastebin.com/s8PXmBRP

Platform 0 - Name NVIDIA CUDA, Vendor NVIDIA Corporation Platform 1 - Name Intel(R) OpenCL, Vendor Intel(R) Corporation Using Platform 0:

Device - Name: GeForce GTX 1070 Device - Type: ALL | GPU Device - Compute Units: 15 Device - Max Work Group Size: 1024 Device - Global memory size: 8589934592 Device - Local memory size: 49152 Device - Max clock frequency: 1746 MHz

Using work group size of 1024

[b'hello', b'hello', b'hello', b'hello', b'hello', b'hello', b'hello', b'hello']

GPU: Test Vector: b'hello' b'9;\xc5\xcdcq\xfe\xf0\x01\x0e\xad\xc6\xdb\x99"yC\xed%\x86ClX\xc5\x80\xef\x8d\x19\xae\x0c\xe7\xea' Time(s): 79.04460406303406

CPU: Test Vector: b'hello' b'9;\xc5\xcdcq\xfe\xf0\x01\x0e\xad\xc6\xdb\x99"yC\xed%\x86ClX\xc5\x80\xef\x8d\x19\xae\x0c\xe7\xea' Time(s): 5.638913869857788

avsync commented 4 years ago

Ran the above bench on an ubuntu pc with a GTX1060 and a similar result, 11.25s per hash. So that rules out windows driver issues.