Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm
http://www.xxhash.com/
Other
8.72k stars 762 forks source link

xxhash.h refactoring, splitting different xxh algorithm to different files #947

Open nu-illasera2 opened 1 month ago

nu-illasera2 commented 1 month ago

request for splitting the massive xxhash.h file into multiple smaller files, each for a different xxh function.

The rationale : while its safe to assume that most compilers will strip un-needed code that isn't used, not all of them will (like TCC and friends), which results in both big object file and a big executable. as well as some IDEs like eclipse cdt takes a while to parse it.

While i can make this change on my own in my side-projects, the issue is , once an update for this project will come up, it will be a nightmare to do it again, and at this point, any patch to split the files for me is impossible.

1.)Any chance to split the xxhash.h single big file into multiple files that holds different hash functions? 2.)Any chance of moving the huge comment at the start of the page with the benchmarking results and description to a different text file? (The license comment ofcourse should stay).

Thanks in advance.

t-mat commented 1 month ago

As for 1.) see this post: https://github.com/Cyan4973/xxHash/issues/550#issuecomment-1611004526

nu-illasera2 commented 1 month ago

As for 1.) see this post: #550 (comment)

Alright so, we wait? well i hope i have added 2 more reasons (While not big nor critical reasons) as to why we should make said changes (That is parsing time by the editor and some compilers won't strip un-used code).

Cyan4973 commented 1 month ago

while its safe to assume that most compilers will strip un-needed code that isn't used, not all of them will (like TCC and friends), which results in both big object file and a big executable.

If produced binary size is your main objective, there are some solutions available immediately, using macro compilation directives:

nu-illasera2 commented 1 month ago

If produced binary size is your main objective, there are some solutions available immediately, using macro compilation directives:

On a personal note, i am using GCC , my suggestion was more of a cover-all for everyone.

* `XXH_NO_LONG_LONG` will also remove `XXH64`, leaving only `XXH32` in the binary.

A little bit of nit-picking, let's stick with naming conventions , if you have a definition for XXH_NO_XXH3, i think other options should follow the same naming convention, while i understand why you went with XXH_NO_LONG_LONG ; i would recommend XXH_NO_XXH64 (Even if it disables multiple things, use a definition with a long name, clarity is more important than extra 10 or even 100 characters, it does not have performance impact anyway).

Cyan4973 commented 1 month ago

XXH_NO_LONG_LONG was created first, long before XXH3 existed, and the purpose was to support old C90 compilers without support for long long (which is a C99 type).

Sure, names can change, and we could create aliases focusing more on the side effect (XXH64 no longer supported) since it's less about compiler support and more about binary size management.

Of course, as soon as we get there, we may face the question "why can't I have XXH64() only without XXH32()" and other esoteric combinations. But solving that part would be a much more involved objective, closer to the initially mentioned topic (segment source code to create selective amalgamation on demand).

nu-illasera2 commented 1 month ago

XXH_NO_LONG_LONG was created first, long before XXH3 existed, and the purpose was to support old C90 compilers without support for long long (which is a C99 type).

Sure, names can change, and we could create aliases focusing more on the side effect (XXH64 no longer supported) since it's less about compiler support and more about binary size management.

Yes sir, an alias would be the way to go about it , so we won't have to break any compatibility with the existing API (I thought about it as well when i made the previous post but didn't mention it , no idea why).