-
Right now I can only run with a single worker on my machine, because I quickly use up all my CUDA memory. Loading and running the network takes around 1.3GB for each process, I only have 4G of GPU mem…
-
你好,
在voxceleb模型中,提取x-vector不是在最后的输出log-softmax层中实现,而是在其之前的某层(embedding层),那么我如何转换模型,能使转换后输出为该embedding层的输出。
-
Can I use this code to train on face videos data, basically I want to create a real-time application where I can generate smiling and blink videos in real-time based on driving videos? @Dov-Gertz @or-…
-
We need to decide on datasets to use in the library. The primary purposes of the datasets will be
1. Benchmarking results to show efficacy of the library
2. Benchmarking results to see which tran…
-
Thank you for the elegant implementation. It helps a lot!
I am wondering why you need to detect the faces from the VoxCeleb dataset since we already have the face bounding box meta data in this dat…
-
Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is g…
-
I tried running the MGif Checkpoints on a driving video of a 3d horse walking and a painting of a horse, both on gray background in RunwayML, but the output is just a single frame, the length matches …
ghost updated
4 years ago
-
Hello,Thank you for your work,but I found you don't provided the code about training by DINO framework. thank you very much.
-
I am not sure if i am missing something. I followed the documentation in how to load a pipeline for speaker diarization offline.
i followed this description:
https://github.com/pyannote/pyannote-a…
-
Hello!
ASVTorch generates 24 MFCCs, so the MFCCS are (n, 24) shape. Your input is (200, 30). Where is the 30 from? Can you please provide some test samples?