JuanFMontesinos / VoViT

VoViT: Low Latency Graph-based Audio-Visual VoiceSeparation Transformer
https://ipcv.github.io/VoViT/
34 stars 9 forks source link

Could you provide real reaquriements for this project? #7

Closed crankyz closed 1 year ago

crankyz commented 1 year ago

With today versions it dosent work, cant find any appropriate configuration. Probably the most theme is version of torch and torchaudio depens. Thank you.

Vadim2S commented 1 year ago

You lucky! I am accidentally try this project myself and accidentally view "issues" section now. Please, use my requirements.txt as you want. Small clarifications: 1) I use Python 3.8 and CUDA 11.1 due my hardware but you can try modern version. 2) Change plt.tight_layout(True) to plt.tight_layout() in inference .py code. 3) You also must install some libraries like ffmpeg and so

libraries installation

apt -y install libsndfile1 apt -y install libopenblas-dev apt -y install ffmpeg

requirements.txt

--find-links https://download.pytorch.org/whl/torch_stable.html numpy<1.24 torch==1.9.1+cu111 torchvision==0.10.1 torchaudio==0.9.1 cython einops scipy matplotlib imageio imageio[ffmpeg] librosa opencv-python onnxruntime PyYAML

crankyz commented 1 year ago

Thank you, Smirnoff.

JuanFMontesinos commented 1 year ago

I'm sorry. I really expected backward compatibility from pytorch so never tracked the version. It seems the guys in charge of audio and complex data type are doing deprecation of the 2 channel operators and certain data types which make the code incompatible with newer versions. From one version to next one filter Banks stopped being saved in the state dictionary. In short, many things going on. I'll try to run it with a fresh version of pytorch eventually. Let me know if the configuration given works. Also there are some torchaudio versions that comes with cuda and others which doesn't.

JuanFMontesinos commented 1 year ago

Thanks to @Vadim2S I quickly did a running collab Open In Colab Hope that helps.

Juan