cr-marcstevens / hashclash

Project HashClash - MD5 & SHA-1 cryptanalysis
Other
742 stars 87 forks source link

possible to have one byte difference for prefix in textcoll.sh? #43

Closed D-VR closed 2 months ago

D-VR commented 2 months ago

First of all, thank you for your great tool!

Apologies if this might be a redundant question

I just wanted to know if its possible to have a 1 byte difference in a prefix using textcoll.sh.

Example:

prefix: "Hello_world"

Then I want to force byte5 to be "+" resulting in the following md5 collision:

Hello+world<random text> Hello_world<random text>

Is this doable in the current tool?

cr-marcstevens commented 2 months ago

No, for textcoll the prefix cannot have a difference, nor the suffix. The only difference is in byte 21 (a +4 difference) of the first of the two textcoll generated blocks. But textcoll allows you to 'program' certain bytes within the generated blocks, see the textcoll.sh script for more details.

cr-marcstevens commented 2 months ago

What you want is possible if you're able to 'shift' your desired change exactly in the spot. You'll still need to work around that some bytes can be programmed without too much issues (i.e. have a very limited allowed alphabet), and some bytes really must be given as much as freedom as possible (i.e. they need a big allowed alphabet).

D-VR commented 2 months ago

Thank you for the quick answer!

If I understand correctly, setting the block bytes and hoping for the best, could work?

FIRSTBLOCKBYTES="--byte0 H --byte1 e --byte2 l --byte3 l --byte4 o --byte5 +_ --byte6 W --byte7 o --byte8 r --byte9 l --byte10 d

cr-marcstevens commented 2 months ago

The runtime is very sensitive to how far your forcing things. If you're just going to set every byte then it might never finish. Your best bet is to first try it running with as large as alphabet as possible for every byte, and then try forcing byte by byte. If you look at the output then one of the first stages gives a frequency table of how often a certain value for a certain byte occurs in a large number of solutions it has found. This frequency table tells you also which specific values are still possible when you're starting to program specific bytes.

cr-marcstevens commented 2 months ago

But programming fewer bytes is always better! (i.e. keep it at a large alphabet).

D-VR commented 2 months ago

Thank you for your explanation, I think I understand it now