alexlnkp commented 3 weeks ago

So, i made the bulk-hashing python module, alexlnkp/bulk-hasher

This implements the idea mentioned in #42

The code for web.py was modified to exit immediately after checking assets, like so:

if not config.nocheck:
    from infer.lib.rvcmd import check_all_assets, download_all_assets

    if not check_all_assets(update=config.update):
        if config.update:
            download_all_assets(tmpdir=tmp)
            if not check_all_assets(update=config.update):
                logging.error("counld not satisfy all assets needed.")
                exit(1)
    exit(1)

In the version without bulk-hasher this line in web.py was commented:

load_dotenv("sha256.env")

Since it was no longer needed.

The infer/lib/rvcmd.py file was also modified, here's the patch containing all of the changes:

diff --git a/infer/lib/rvcmd.py b/infer/lib/rvcmd.py
index d5f6bbd..f79555a 100644
--- a/infer/lib/rvcmd.py
+++ b/infer/lib/rvcmd.py
@@ -1,6 +1,7 @@
 import os
 from pathlib import Path
 import hashlib
+import bulkhasher
 import requests
 from io import BytesIO
 import logging
@@ -53,21 +54,21 @@ def check_all_assets(update=False) -> bool:
     if not check_model(
         BASE_DIR / "assets" / "hubert",
         "hubert_base.pt",
-        os.environ["sha256_hubert_base_pt"],
+        bulkhasher.get_hash_from_file("assets/hubert/hubert_base.pt", "SHA256"),
         update,
     ):
         return False
     if not check_model(
         BASE_DIR / "assets" / "rmvpe",
         "rmvpe.pt",
-        os.environ["sha256_rmvpe_pt"],
+        bulkhasher.get_hash_from_file("assets/rmvpe/rmvpe.pt", "SHA256"),
         update,
     ):
         return False
     if not check_model(
         BASE_DIR / "assets" / "rmvpe",
         "rmvpe.onnx",
-        os.environ["sha256_rmvpe_onnx"],
+        bulkhasher.get_hash_from_file("assets/rmvpe/rmvpe.onnx", "SHA256"),
         update,
     ):
         return False
@@ -89,18 +90,16 @@ def check_all_assets(update=False) -> bool:
         "f0G48k.pth",
     ]
     for model in model_names:
-        menv = model.replace(".", "_")
         if not check_model(
-            rvc_models_dir, model, os.environ[f"sha256_v1_{menv}"], update
+            rvc_models_dir, model, bulkhasher.get_hash_from_file(f"assets/pretrained/{model}", "SHA256"), update
         ):
             return False

     rvc_models_dir = BASE_DIR / "assets" / "pretrained_v2"
     logger.info("checking pretrained models v2...")
     for model in model_names:
-        menv = model.replace(".", "_")
         if not check_model(
-            rvc_models_dir, model, os.environ[f"sha256_v2_{menv}"], update
+            rvc_models_dir, model, bulkhasher.get_hash_from_file(f"assets/pretrained_v2/{model}", "SHA256"), update
         ):
             return False

@@ -117,15 +116,14 @@ def check_all_assets(update=False) -> bool:
         "VR-DeEchoNormal.pth",
     ]
     for model in model_names:
-        menv = model.replace(".", "_")
         if not check_model(
-            rvc_models_dir, model, os.environ[f"sha256_uvr5_{menv}"], update
+            rvc_models_dir, model, bulkhasher.get_hash_from_file(f"assets/uvr5_weights/{model}", "SHA256"), update
         ):
             return False
     if not check_model(
         BASE_DIR / "assets" / "uvr5_weights" / "onnx_dereverb_By_FoxJoy",
         "vocals.onnx",
-        os.environ[f"sha256_uvr5_vocals_onnx"],
+        bulkhasher.get_hash_from_file(f"assets/uvr5_weights/onnx_dereverb_By_FoxJoy/vocals.onnx", "SHA256"),
         update,
     ):
         return False

Here's some testing data on the startup timing:

With bulk-hasher

real	user	sys
5.478s	4.677s	0.787s
5.570s	4.831s	0.723s
5.398s	4.619s	0.764s
5.419s	4.652s	0.744s
5.407s	4.647s	0.748s
5.395s	4.712s	0.665s
5.414s	4.662s	0.737s
5.427s	4.670s	0.744s

Without bulk-hasher

real	user	sys
5.541s	4.721s	0.806s
5.514s	4.756s	0.746s
5.405s	4.701s	0.689s
5.412s	4.701s	0.697s
5.531s	4.710s	0.809s
5.423s	4.670s	0.739s
5.450s	4.705s	0.726s
5.542s	4.788s	0.734s

Average With bulk-hasher

real	user	sys
5.438s	4.684s	0.739s

timing	formula
real	( 5.478 + 5.570 + 5.398 + 5.419 + 5.407 + 5.395 + 5.414 + 5.427 ) / 8
user	( 4.677 + 4.831 + 4.619 + 4.652 + 4.647 + 4.712 + 4.662 + 4.670 ) / 8
sys	( 0.787 + 0.723 + 0.764 + 0.744 + 0.748 + 0.665 + 0.737 + 0.744 ) / 8

Average Without bulk-hasher

real	user	sys
5.477s	4.719s	0.743s

timing	formula
real	( 5.541 + 5.514 + 5.405 + 5.412 + 5.531 + 5.423 + 5.450 + 5.542 ) / 8
user	( 4.721 + 4.756 + 4.701 + 4.701 + 4.710 + 4.670 + 4.705 + 4.788 ) / 8
sys	( 0.806 + 0.746 + 0.689 + 0.697 + 0.809 + 0.739 + 0.726 + 0.734 ) / 8

Average difference (without - with)

real	user	sys
0.039s	0.035s	0.004s

The results are a bit unexpected, since i didn't even use the main idea of hashing files in bulk with parallelism, however the results are the results

fumiama commented 3 weeks ago

Well, if you just use C to calculate the SHA256, maybe the speed will be even slower than python since it used openssl with some SIMD optimizations.

The thought bulk is a good starting point. Maybe you can let the py-caller register all files that are to be hashed first, then by calling a single .validate(expected_total_hash), all the files can be verified.

alexlnkp commented 3 weeks ago

Well, if you just use C to calculate the SHA256, maybe the speed will be even slower than python since it used openssl with some SIMD optimizations.

The thought bulk is a good starting point. Maybe you can let the py-caller register all files that are to be hashed first, then by calling a single .validate(expected_total_hash), all the files can be verified.

I could work on the C bulk-hasher project more, but I'm having a lot of troubles with building wheels for pypi... It wants manylinux wheels, but i'm not sure how to compile for different python versions...

$ auditwheel-symbols -m 2_27 dist/bulkhasher-0.0.2-cp312-cp312-linux_x86_64.whl 
bulkhasher/bulkhasher.so is not manylinux_2_27 compliant because it links the following forbidden libraries:
libpython3.12.so.1.0
libgomp.so.1

Not sure what to do about this, to be honest... I'm linking the python 3.12 since otherwise the linker won't find symbols for Python.h, i'm linking against libgomp.so since i'm using it for parallel hash computation and checks... I guess i could embed the libgomp in the wheel? not sure how that would work though...

fumiama commented 3 weeks ago

I would like to invite the whl and cython expert @synodriver to help you understand that.

synodriver commented 3 weeks ago

Are you using openmp for that? Github action redirect gcc to clang on macos, which lead to compiler error: "omp.h not found". Windows and linux build should be fine, and I have a well-written example for that.

synodriver commented 3 weeks ago

Bedides, have you ever did a memory check using valgrind? It will be your best frind when debugging python extension modules. And we'd better ensure that every malloc have a corresponding free.

synodriver commented 3 weeks ago

Third, your code seems forget to release the gil. No wonder it's not fast as you expected. Try using Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS pair when doing C-level parallel calculation.

alexlnkp commented 3 weeks ago

Bedides, have you ever did a memory check using valgrind? It will be your best frind when debugging python extension modules. And we'd better ensure that every malloc have a corresponding free.

I love Valgrind! I use it constantly to detect memleaks, but sometimes it can ring false alarms openmp does parallel in a way that doesn't free each thread's allocated memory right after the thread is done executing, so after the parallel execution is done some memory is "hanging", specifically for openmp this is just a theory, as i have no actual idea why openmp seemingly leaks, especially when it doesn't do any allocations within the parallelized code.

also, sometimes valgrind may be upset at the IO operations more than anything else, which is a little silly, since some internal libraries may appear to not free some memory, but reality is that the kernel just didn't re-assign the memory to wherever it considers it should go

despite all that, i'd say my valgrind tests yield pretty good results.

i've looked through my code multiple times and i couldn't find any leaks that were directly caused by it, but i accept that i might, and quite possibly am, wrong about that :)

alexlnkp commented 3 weeks ago

Third, your code seems forget to release the gil. No wonder it's not fast as you expected. Try using Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS pair when doing C-level parallel calculation.

That actually worked! Never would i expect python expecting to be told explicitly that it's okay to run our code the way we wrote it, lol Thank you for Your wisdom :pray:

alexlnkp commented 3 weeks ago

Are you using openmp for that? Github action redirect gcc to clang on macos, which lead to compiler error: "omp.h not found". Windows and linux build should be fine, and I have a well-written example for that.

I'll deal with that later, but thanks for mentioning this issue! I first need to at least build manylinux wheels, then worry about all of the other platforms; i think it'll be easier that way...

The main problem is that i'm unable to make the manylinux wheels properly, nor could i find any information on the matter of how to do it "properly". All i found were docker images made by auditwheel, but those didn't appear to do anything? (i'm not smart when it comes to anything, especially docker)

synodriver commented 3 weeks ago

Actually there is cibuildwheel which can run in github action to build manylinux wheels, all you need is to proper write your setup.py, BTW, I think the cmakefile is not necessary here, grab all c source to Extension should be fine.

fumiama commented 3 weeks ago

Actually there is cibuildwheel which can run in github action to build manylinux wheels

You can refer to a repo to show an example for a good understanding.

alexlnkp commented 3 weeks ago

Actually there is cibuildwheel which can run in github action to build manylinux wheels, all you need is to proper write your setup.py, BTW, I think the cmakefile is not necessary here, grab all c source to Extension should be fine.

I never worked with setup.py so it's really hard for me to use it as a build system. I'm just more familiar with CMake... It doesn't seem to be too unusual for python modules written in C and/or C++ to use CMake for building, but i do understand the appeal of having a single setup file for building instead of two

alexlnkp commented 2 weeks ago

UPDATE: Fixed the build with heavy support from auditwheel's maintaner, @mayeut Deployment works now, assuming the module wasn't broken in the process of fixing the CI

fumiama / Retrieval-based-Voice-Conversion-WebUI

Testing with custom C hasher #47

With bulk-hasher

Without bulk-hasher

Average With bulk-hasher

Average Without bulk-hasher

Average difference (without - with)