Open alexlnkp opened 3 weeks ago
Well, if you just use C to calculate the SHA256, maybe the speed will be even slower than python since it used openssl with some SIMD optimizations.
The thought bulk
is a good starting point. Maybe you can let the py-caller register all files that are to be hashed first, then by calling a single .validate(expected_total_hash)
, all the files can be verified.
Well, if you just use C to calculate the SHA256, maybe the speed will be even slower than python since it used openssl with some SIMD optimizations.
The thought
bulk
is a good starting point. Maybe you can let the py-caller register all files that are to be hashed first, then by calling a single.validate(expected_total_hash)
, all the files can be verified.
I could work on the C bulk-hasher project more, but I'm having a lot of troubles with building wheels for pypi... It wants manylinux wheels, but i'm not sure how to compile for different python versions...
$ auditwheel-symbols -m 2_27 dist/bulkhasher-0.0.2-cp312-cp312-linux_x86_64.whl
bulkhasher/bulkhasher.so is not manylinux_2_27 compliant because it links the following forbidden libraries:
libpython3.12.so.1.0
libgomp.so.1
Not sure what to do about this, to be honest...
I'm linking the python 3.12 since otherwise the linker won't find symbols for Python.h
, i'm linking against libgomp.so since i'm using it for parallel hash computation and checks...
I guess i could embed the libgomp in the wheel? not sure how that would work though...
I would like to invite the whl and cython expert @synodriver to help you understand that.
Are you using openmp for that? Github action redirect gcc to clang on macos, which lead to compiler error: "omp.h not found". Windows and linux build should be fine, and I have a well-written example for that.
Bedides, have you ever did a memory check using valgrind? It will be your best frind when debugging python extension modules. And we'd better ensure that every malloc
have a corresponding free
.
Third, your code seems forget to release the gil. No wonder it's not fast as you expected. Try using Py_BEGIN_ALLOW_THREADS
and Py_END_ALLOW_THREADS
pair when doing C-level parallel calculation.
Bedides, have you ever did a memory check using valgrind? It will be your best frind when debugging python extension modules. And we'd better ensure that every
malloc
have a correspondingfree
.
I love Valgrind! I use it constantly to detect memleaks, but sometimes it can ring false alarms openmp does parallel in a way that doesn't free each thread's allocated memory right after the thread is done executing, so after the parallel execution is done some memory is "hanging", specifically for openmp this is just a theory, as i have no actual idea why openmp seemingly leaks, especially when it doesn't do any allocations within the parallelized code.
also, sometimes valgrind may be upset at the IO operations more than anything else, which is a little silly, since some internal libraries may appear to not free some memory, but reality is that the kernel just didn't re-assign the memory to wherever it considers it should go
despite all that, i'd say my valgrind tests yield pretty good results.
i've looked through my code multiple times and i couldn't find any leaks that were directly caused by it, but i accept that i might, and quite possibly am, wrong about that :)
Third, your code seems forget to release the gil. No wonder it's not fast as you expected. Try using
Py_BEGIN_ALLOW_THREADS
andPy_END_ALLOW_THREADS
pair when doing C-level parallel calculation.
That actually worked! Never would i expect python expecting to be told explicitly that it's okay to run our code the way we wrote it, lol Thank you for Your wisdom :pray:
Are you using openmp for that? Github action redirect gcc to clang on macos, which lead to compiler error: "omp.h not found". Windows and linux build should be fine, and I have a well-written example for that.
I'll deal with that later, but thanks for mentioning this issue! I first need to at least build manylinux wheels, then worry about all of the other platforms; i think it'll be easier that way...
The main problem is that i'm unable to make the manylinux wheels properly, nor could i find any information on the matter of how to do it "properly". All i found were docker images made by auditwheel, but those didn't appear to do anything? (i'm not smart when it comes to anything, especially docker)
Actually there is cibuildwheel which can run in github action to build manylinux wheels, all you need is to proper write your setup.py, BTW, I think the cmakefile is not necessary here, grab all c source to Extension
should be fine.
Actually there is cibuildwheel which can run in github action to build manylinux wheels
You can refer to a repo to show an example for a good understanding.
Actually there is cibuildwheel which can run in github action to build manylinux wheels, all you need is to proper write your setup.py, BTW, I think the cmakefile is not necessary here, grab all c source to
Extension
should be fine.
I never worked with setup.py so it's really hard for me to use it as a build system. I'm just more familiar with CMake... It doesn't seem to be too unusual for python modules written in C and/or C++ to use CMake for building, but i do understand the appeal of having a single setup file for building instead of two
UPDATE: Fixed the build with heavy support from auditwheel's maintaner, @mayeut Deployment works now, assuming the module wasn't broken in the process of fixing the CI
So, i made the bulk-hashing python module, alexlnkp/bulk-hasher
This implements the idea mentioned in #42
The code for web.py was modified to exit immediately after checking assets, like so:
In the version without bulk-hasher this line in web.py was commented:
Since it was no longer needed.
The infer/lib/rvcmd.py file was also modified, here's the patch containing all of the changes:
Here's some testing data on the startup timing:
With bulk-hasher
Without bulk-hasher
Average With bulk-hasher
Average Without bulk-hasher
Average difference (without - with)
The results are a bit unexpected, since i didn't even use the main idea of hashing files in bulk with parallelism, however the results are the results