eloimoliner / denoising-historical-recordings

A two-stage U-Net for high-fidelity denoising of historical recordings
MIT License
94 stars 12 forks source link

Will it work in Windows without CUDA? #4

Closed vitacon closed 2 years ago

vitacon commented 2 years ago

Hello, The readme says: "You will need at least python 3.7 and CUDA 10.1 if you want to use GPU."

Unfortunately, my first attempt to run it in Windows without CUDA-supporting VGA failed. There is really no separate environment file for CPU-only? Is it possible to make it work without massive changes to the code?

eloimoliner commented 2 years ago

Hello,

Yes, it is possible to run it in CPU without the cuda dependencies, just install the dependencies in environment.yml (except cudnn and cudatoolkit). You could just remove them from the environemt.yml file. I'll consider adding an environment.yml for CPU-only, thanks for the suggestion. The code should work as it is, so there is no need to change anything. However, it will be much slower so I don't recommend you training without a GPU, but just for inference is ok.

If you just want to try it with your recordings, you can also check the colab notebook: https://colab.research.google.com/github/eloimoliner/denoising-historical-recordings/blob/master/colab/demo.ipynb This will be much easier, as the installation is automatized.

vitacon commented 2 years ago

Thanks for the reply. Deleting two lines does not sound that complicated so I'll give it a try. =)

I don't plan training so hopefully the low speed won't be a major problem. I already tried the colab notebook and it worked fine but I've got a small collections of records from 1930s and 1940s so I thought it would be a bit more convenient to process them locally because uploading and downloading is a bit annoying. =}

eloimoliner commented 2 years ago

Great! Good luck with the installation! Let me know if you need some help with it.

vitacon commented 2 years ago

I'm sorry, but it does not go quite smoothly... =(

Python 3.8.12 (default, Oct 12 2021, 03:01:40) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32

>conda env update -f environment-cpu.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - libstdcxx-ng==9.3.0=hd4cf53a_17
  - openssl==1.1.1k=h27cfd23_0
  - readline==8.1=h27cfd23_0
  - tk==8.6.10=hbc83047_0
  - ld_impl_linux-64==2.35.1=h7274673_9
  - zlib==1.2.11=h7b6447c_3
  - python==3.7.11=h12debd9_0
  - libgcc-ng==9.3.0=h5101ec6_17
  - libffi==3.3=he6710b0_2
  - pip==21.0.1=py37h06a4308_0
  - sqlite==3.36.0=hc218d9a_0
  - setuptools==52.0.0=py37h06a4308_0
  - libgomp==9.3.0=h5101ec6_17
  - ncurses==6.2=he6710b0_1
  - _openmp_mutex==4.5=1_gnu
  - xz==5.2.5=h7b6447c_0

I googled "ResolvePackageNotFound:" and i found this:

ResolvePackageNotFound error describes all packages not installed yet, but required. To solve the problem, move them under pip section

I tried that (plus changing the first "=" to "==") and I got this:

Installing pip dependencies: \ Ran pip subprocess with arguments:
['C:\\Users\\Vita\\Anaconda3\\envs\\historical_denoiser\\python.exe', '-m', 'pip', 'install', '-U', '-r', 'C:\\Users\\Vita\\audio-separation\\denoising\\condaenv.rq0x4jtj.requirements.txt']
Pip subprocess output:

Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement libstdcxx-ng==9.3.0 (from versions: none)
ERROR: No matching distribution found for libstdcxx-ng==9.3.0

failed

Do you have any idea what's wrong?

eloimoliner commented 2 years ago

Hey,

Installing dependencies is always a pain. I was using python 3.7 when I wrote this, so the cause may be that your python version is different. Or maybe some windows related issue. Anyway, I would suggest you to install the dependencies independently with "pip" or "conda install" . Starting with a new version of tensorflow (,≥2.3.0) and then (numpy, pandas, tqdm, hydra, soundfile). I think these should be enough, otherwise try to execute it and see if some other package is missing. For hydra in particular, you should try to install the same version as in the environment file, otherwise you will have to do some minor changes in the code.

vitacon commented 2 years ago

Well, it does not look very good. =(

Tensorflow, numpy, etc. seem to be fine.

The problematic packages are these:

  - libstdcxx-ng==9.3.0=hd4cf53a_17
  - openssl==1.1.1k=h27cfd23_0
  - readline==8.1=h27cfd23_0
  - (...)

This sent me to https://anaconda.org/conda-forge/libstdcxx-ng And it says:

linux-ppc64le  v11.2.0
linux-64  v11.2.0
linux-aarch64  v11.2.0
linux-s390x  v11.2.0 

It does not list Windows at all and I can't find it even here: https://conda.anaconda.org/conda-forge/win-64/

It seems there is no way to install this package to Windows...?

eloimoliner commented 2 years ago

Hey,

Have you tried executing the script even if these packages are missing? What is the error then? Note that it is not required that you have the exact versions of the packages.

I'm sure it should be possible to make this run on a windows computer, but I haven't tried it personally.

vitacon commented 2 years ago

I made several experiments today and it helped a bit.

  1. "Env update" crashed on several missing packages so I deleted all these lines.
  2. "Env update" finally somehow passed .
  3. "bash inference.sh" does not work in Winddows so I replaced it with "python inference.py inference.audio=%1".
(historical_denoiser) >python inference.py inference.audio=eva-cut.wav

2022-02-26 11:32:28.058955: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-02-26 11:32:28.059048: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-02-26 11:32:29.839687: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2022-02-26 11:32:29.855993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:07:00.0 name: GeForce GT 1030 computeCapability: 6.1
coreClock: 1.5185GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 44.76GiB/s
2022-02-26 11:32:29.856961: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-02-26 11:32:29.857758: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
2022-02-26 11:32:29.860069: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2022-02-26 11:32:29.860867: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2022-02-26 11:32:29.861841: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2022-02-26 11:32:29.862613: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found
2022-02-26 11:32:29.863382: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
2022-02-26 11:32:29.863430: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-02-26 11:32:29.864145: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-26 11:32:29.870959: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1e7d8f40950 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-02-26 11:32:29.871016: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-02-26 11:32:29.871578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-02-26 11:32:29.871958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
[2022-02-26 11:32:33,314][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 95, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for C:\Users\Vita\audio-separation\denoising\experiments/trained_model\checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "inference.py", line 150, in main
    _main(args)
  File "inference.py", line 144, in _main
    run(args)
  File "inference.py", line 20, in run
    unet_model.load_weights(ckpt)
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2176, in load_weights
    py_checkpoint_reader.NewCheckpointReader(filepath)
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 99, in NewCheckpointReader
    error_translator(e)
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 35, in error_translator
    raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for C:\Users\Vita\audio-separation\denoising\experiments/trained_model\checkpoint

I suppose it tries to use CUDA with my GeForce GT 1030 but as far as I know 1030 is almost unusable with CUDA and I never made it work. How to force the app to use just CPU?

eloimoliner commented 2 years ago

Hello,

This error is not related to cuda. The cuda related messages are just warnings. It crashes because it can't find the checkpoint file that contains the weights of the model. Make sure you download it and put it in the right path.

Maybe you will have to do a minor change, it is looking for experiments/trained_model\checkpoint, but it should be experiments\trained_model\checkpoint in a windows machine. You can correct that in the conf\conf.yaml file.

vitacon commented 2 years ago

Oh, you're right about the warnings. I ignored it and unzipped the model to "experiments\trained_model". I also edited the path to: path_experiment: "experiments\trained_model" #there should be a better way to do this

The result was this:

[2022-02-26 13:30:53,171][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "inference.py", line 150, in main
    _main(args)
  File "inference.py", line 144, in _main
    run(args)
  File "inference.py", line 20, in run
    unet_model.load_weights(ckpt)
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2176, in load_weights
    py_checkpoint_reader.NewCheckpointReader(filepath)
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 95, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
ValueError

Then I changed the backslash to the original slash and then to two backslashes ("experiments\trained_model"). Both made the same result so it seems it could handle both "/" and "\":

[2022-02-26 13:44:32,984][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "inference.py", line 150, in main
    _main(args)
  File "inference.py", line 144, in _main
    run(args)
  File "inference.py", line 49, in run
    data, samplerate = sf.read(audio)
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\soundfile.py", line 257, in read
    subtype, endian, format, closefd) as f:
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\soundfile.py", line 1184, in _open
    "Error opening {0!r}: ".format(self.name))
  File "C:\Users\Vita\Anaconda3\envs\historical_denoiser\lib\site-packages\soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'other.wav': System error.

It seemed there was something wrong with the input file but I was not sure what - location or format? So tried to add the full path: python inference.py inference.audio=c:\Users\Vita\audio-separation\denoising\other.wav

This finally worked and I got (almost) the same result as from the colab notebook. =) (There are some minor differences between the files. I suppose it is CUDA/CPU related.)

Two questions remain:

  1. Why the bare file name does not work? What it is the default working directory?
  2. What changes and packages were really necessary? I might try it again on another clean computer next week.
vitacon commented 2 years ago

Damn, I see the editor replaced all double-backslashes with a single backslash so my text is a bit confusing now. =(

eloimoliner commented 2 years ago

I'm glad you managed to make it work. What do you mean with minor differences with the colab results? Is the quality slightly worse or just different in other terms? When I developed this code I was using the full path all the time, I just did not try using relative paths. For some reason it fails, it should be easy to solve this issue, though. I'll try to correct that in the next commit.

I think the necesdary installation steps are to have the most important packages installed, note that you don't even need conda to do so. So, I would start with tensorflow and then install the rest. Look at the imports in inference.py to see which packages you should install.

vitacon commented 2 years ago

I think the quality is not affected. I just made file compare and there are some different bytes here and there (e.g. 80 turns to 81) so it looks like a slightly different rounding.

I just tried to run it on my other (testing) computer (only onboard VGA and 4 GB of RAM) but I failed. I think I deleted the same lines from "environment.yml" as on my primary computer, created the environment without any major issue, ran it and it crashed saying: "ImportError: DLL load failed: Module not found" and "Failed to load the native TensorFlow runtime". That's strange, because I can see tensorflow 2.3.0 was successfully installed through "environment.yml". I think I give it a rest for a while and I stick with my primary computer... =P

eloimoliner commented 2 years ago

Yes, I think this difference may be caused because of some rounding errors.

I'm affraid I can't help you with this installation error you say. Probably tensorflow's installation didn't work or some other package is missing.

I'll close this issue for the moment.

vitacon commented 2 years ago

OK. Thanks again for a handy tool! =)