CommunityDragon / CDTB

A library containing everything to extract files from client files.
GNU Lesser General Public License v3.0
119 stars 33 forks source link

Add more bin hashes #22

Closed moonshadow565 closed 5 years ago

benoitryder commented 5 years ago

May I ask how you retrieved those hashes? A lot of them are unused in current bin files, so I wondered.

moonshadow565 commented 5 years ago

May I ask how you retrieved those hashes? A lot of them are unused in current bin files, so I wondered.

Retrieved with IDA script from the last patch where they were present.

benoitryder commented 5 years ago

I checked the hashes, removed duplicates, filtered actually used ones. For instance, the "interface" and "interface instance" types are never used in hashes (which makes sense), and added some that are both used as hash values and field names. If you don't mind I'll push this commit I have locally and close this merge request (I don't think GitHub allows to update the merge request with my commit).

Even after removing the duplicates, there are still nearly 500 new hashes, which is really nice. So that's a really nice addition. And even the unused strings may help us guess new hashes. Thanks for the help!

moonshadow565 commented 5 years ago

im fine with removing duplicates but is there any issue with having "unused" hashes? they are usefull for older files as well as dumping in game structures that get serialized from bins

benoitryder commented 5 years ago

There is no hard issue (this won't break anything). I just tend to keep the hash files "clean" with respect to what is used by CDTB/CDragon. When searching for new hashes, I usually build word lists based on existing hashes and try combinations of them to find new ones. Having the list clumbered by obsolete words (e.g. removed spells, etc.) would slow down the process (but I guess I could filter them first). I haven't though about older files since I don't need/use them, but that makes sense.

The only real constraint I have is to only add "verified" words (collisions area easy and it would be hard to detect wrongly added strings), and split them correctly in the different categories (I also take care of the casing: hash values usually start with a uppercase letter, and field names with a lowercase one).

moonshadow565 commented 5 years ago

Doubt there is big performance difference from 1k to 10k files, non-crypto hashes are fast to compute. The hashes i added are clean as it gets they are extracted as they appear in game(casing included). There is nothing bruteforced here and with such low amount hashes that were used at some point doubt there is going to be collisions. PR-ed here because this usually the first place people go to get their hashes but i guess i could put them somewhere else.

benoitryder commented 5 years ago

When combining words, it goes exponentially: combining 4 words among 100 is 100M hashes, combining 4 words among 1000 is 1000G hashes. Even if hashes are fast to compute, the difference is huge. But as I said, I could filter them beforehand, so that should be fine.

The hashes i added are clean as it gets they are extracted as they appear in game(casing included).

Regarding the extraction, do you extract the field names, the hash values or both? Are you able to extract any information about entry paths? (I don't know what can be extracted with the IDA scripts.)

PR-ed here because this usually the first place people go to get their hashes but i guess i could put them somewhere else.

True. It makes sense to have them all in one place. I'm ok to add them all; as you suggest. Regarding your commit, you'll just have to:

I'll add the new entries to hashes.binhash.txt if needed (some are identical to field names).

moonshadow565 commented 5 years ago

The extraction process was pretty straightforward(after digging thru IDA documentation that is). Riot had issue with their constexpr function so it wasn't really constexpr the string used to be preserved and passed along with their has.

I'll remove duplicate type hashes. I didn't originally sort them because i wanted that nice git diff but sure i'll do that. I also don't mind if you do it instead and just close this PR.

moonshadow565 commented 5 years ago

Btw maybe it would be good idea to have separate files for bruteforce generation.

benoitryder commented 5 years ago

Riot had issue with their constexpr function so it wasn't really constexpr the string used to be preserved and passed along with their has.

That's an interesting bug. People tend to forgot that constexpr only means "may be evaluated at compile time" and not "will be executed at compile-time".

I also don't mind if you do it instead and just close this PR.

As you wish. :) I'll push myself tonight if you haven't yet.

Btw maybe it would be good idea to have separate files for bruteforce generation.

Probably. The "bruteforce generation" files can be computed from the full files and I only need them locally (there is no automatic guessing running for now), so for now it's fine. And it would not be hard to generate them later if needed.

Also, don't hesitate to join us on Discord if you have more stuff to share and just want to discuss.