bakd247 / ecdsaKeyFinder

A Python based ECDSA secp256k1 private key recovery tool
https://hashadder.com/
27 stars 7 forks source link

Hey you need some help with the project? #1

Closed MaltoonYezi closed 1 year ago

MaltoonYezi commented 1 year ago

Hello, Do You need some assistance? I wanna reach out to you. I can code in Python and familiar with cryptography behind Bitcoin. I can help with the project. Like, it would be possible to chat through Discord, Twitter, Telegram, Signal........ you name it Also messaged you on Youtube

bakd247 commented 1 year ago

I could use some help for sure... I am currently working on a newer version that uses all possible factors of 2 and 3 as well as divisors of 2 and 3 for a key and then uses randomness to search for a match of that multiple then divides back to 1 the same way the current version does but in a much much faster way by storing the keys by their first 4-5 digits of the x-coordinate to disk. There by increasing search time per try drastically... I am also using pypy and trying to get it working in Cython as well....so it runs in C...as well as using asyncio and multiprocessing to create the key files and speedup search... the way that I have it working (current post on github )right now is not efficient at all and would be considered the "naive approach"....so this newer version will be much much better. I am working on having the cpu generate the multiples while the user inputs their keys they want to recover... by putting all these values into a tuple and indexing them it is way way faster... I think I am good on these things at the moment.... but if there is anyway you could figure out how to get the tinyec public key generation function working as a dot product with cupy on a cuda gpu that would be very very helpful. because the function pubKey = curve.g*1 can also be a numpy dot procduct by saying numpy.dot(g,1) you can then multiply a key using the dot product in cupy(fyi...cupy takes a while to install I thought it was freezing or something but it wasnt) on a gpu....just need a way to change(wrap) the 256 bit values to double precision floats of any other data type that the gpu can handle it...like float64.

i plan on having the newest version posted with in a week or so as I have been working hard on it every day for the past few weeks to get it as fast as possible and running in C.... let me know if you can find out anything about cupy as it seems like it would be the best way to implement the functions onto a gpu.... opencl would be best but I would prefer to be able to support older gpus so maybe just an older version of pyopencl? I dont know....

MaltoonYezi commented 1 year ago

I am not really familliar with tinyec, since I haven't worked with it, except going through your videos. So, I don't know on this . So, far, I've manly been working with my own custom solutions and theoretical ideas.

I tried to speed up my Python code with PyCuda, CUDF, Numba, CuPy. After trying to implement Python code for GPU use, I came to the conclusion that to use GPU processing, it is just better to use C++ instead for the whole code.

For PyCuda and other NVIDI GPU Python implementaion, there still have to be a "kernel" coded inside the python code. That kernel is coded in C++ anyway. Vanilla Python can recognize and store in variables intergers of 256 and more bits. However the vanilla C++'s biggest int size is 64 bits. When implementing cryptographic code with big integers in python GPU, there might be a conflict between the python code and the GPU kernel, because python can handle a 256 bit number, but its C++ GPU kernel cannot.

May be it is possible to connect some libraries/dependencies like "boost" to the kernel in order to handle all of that, but I have no idea how.

Apart from that, "boost" will be essential for most of the cryptographic stuff in C++, as it allows variables and operations of such big integers.

I am not sure, why you'd want to use C or Cython for your purposes. C++ is more extent and doesn't lose in anything of performance to C.

In your case, I could use python (on cpu) first to build a prototype solution for your idea and then switch to CUDA C++ for full scale implementation.

MaltoonYezi commented 1 year ago

Ouch, accidentally closed the discussion

bakd247 commented 1 year ago

I found this out recently as well...the issue seems to keep being that the max I can get out of a gpu is a 64 bit number but considering you can do sha256 with a gpu and get an output that way...I would think that it should be possible at least... it would be perfect if we could get a gpu doing sha256 and then passing the integer off to the cpu to get the x coordinate of that key and do the search...that would be a massive speedup considering I am sorting the keys in the new version by their first 4 digits into a file...so that once a pubKey is created the x.coordinate is matched up against a file that only has other keys that have the same first 4 digits allowing for near constant lookup time so the only bottle neck should be the cpus ecdsa operation

bakd247 commented 1 year ago

check this out: https://en.wikipedia.org/wiki/256-bit_computing "256-bit vector registers are used to store several smaller numbers, such as eight 32-bit floating-point numbers, and a single instruction can operate on all these values in parallel" apparently if you can just change the 256 bit number into 8 32 bit numbers and broadcast them as vectors to the gpu...then can do operations with them... one main step I am trying to do with the new version is pre-calculate the powers of 2 and 3 in a grid for i in powers of 2; for j in powers of 3 up to the iterations specified.... if we can get am operation that can calculate a 256 bit number via mod n to make the multiple precalculations really fast...the cpu can do ecdsa once it has the "lowerbound" (less than ((n-1)//2)) version of the 256 bit number and we can always figure out ecdsa via gpu later...

bakd247 commented 1 year ago

so basically if we can get an operation that takes in a number...changes it to bytes, splits it up into 8 32 bit integers instead of floats and can then do the "power" operation on it modulo "n" and then check if the number is greater than ((n-1)//2)...if it is change its return value to its modular inverse and if less than...return itself....then passes it to the cpu to do ecdsa//this would drastically speed up search time when combined with the near constant time search method of the new version

MaltoonYezi commented 1 year ago

Yeah, I've been aware that a 256 bit number could be split into 8, 32 bit numbers. Although for me It's been just easier to prototype in cpu python first, measure the speed, and then reimplement it in CUDA C++.

I am not getting your idea though. Just picture public key as an cryptographically obfuscated private key. From your first message:

all possible factors of 2 and 3 as well as divisors of 2 and 3

Did you mean powers of 2 and 3? Becase there are (n-1)//2 and (n-1)//3, multiples of 2 and 3 within the range of 1 to n-1, which are impossible to store in a precomputed array. As opposed to the powers of 2 and 3, which there are just 255 and 161 respectively.

We could think that if a number is not getting devided either by 2 or 3, then the number is a prime or carmichael (pseudo-prime that still isn't divisible (some them are divisible by 3, but they are not included in the computations). So there could be about (1/6) or apr. 16.7% of such numbers within 1 and n-1.

So if we were to try to devide a public key by the public keys of 2 and 3 consequetly in order to get to 1 then it'd take most likely (255 + 161)/2 = 208 divisions. Still on each step of such division we have a 16.7% chance of meeting a prime or pseudo prime number, If we continue to devide that number, then we would get some public key represented by a fractional number that we don't know, but it's not ganna be 1.

We could search for for a right divisible number by adding or subtracting 1 to the division results on each step in a random order, and check every instance of the division, if we could get the generator point at the end. Still, such favorable scenario will only pop up in 1 out of (161! to 255!) computations. Something that isn't computable right now,

MaltoonYezi commented 1 year ago

Also, not sure of how you would derive an x public key coordinate out of its first 4 digits.

MaltoonYezi commented 1 year ago

Hey may be we could talk about it over zoom or something?

bakd247 commented 1 year ago

wouldn't be using the gpu to make a public key...just to calculate a random scaler to test for a match.... you would split it up using a byte array and then concatenate them together after doing the function.... like if you have 100....instead of using 100 you would use 10 10s in an array...just a small example

you would have to take the 256 bit number and turn it into an array of 32 octects....do the function and then convert it back

would have to figure out how to do ecdsa later and just have a gpu doing the initial multiple calculations and passing them off to the cpu... can run multiple cores to intercept lots of 256 bit numbers... then use the cpu to make the public keys....

bakd247 commented 1 year ago

I am basically using ecdsa as a "black box" for this key finder at the moment until I can find a faster method of ecdsa.... so any implementation of ecdsa would work...dont have to use tinyec...I am just using its curve function to create the keys...

MaltoonYezi commented 1 year ago

Hey, maybe you could do another video on what you wanna do?