Find and implement model which produces speaker embeddings - Githubissues

SebChw / Actually-Robust-Training

Actually Robust Training - Tool Inspired by Andrej Karpathy "Recipe for training neural networks". It allows you to decompose your Deep Learning pipeline into modular and insightful "Steps". Additionally it has many features for testing and debugging neural nets.

MIT License

44 stars 0 forks source link

Find and implement model which produces speaker embeddings #34

Closed dkoterwa closed 1 year ago

dkoterwa commented 1 year ago

I think that this is the first step in order to build a good speaker recognition model. We have to produce quality embeddings inside Siamese Network, which will output high similarity for utterances from same user, and high dissimilarity for utterances of two different users.

dkoterwa commented 1 year ago

https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py

dkoterwa commented 1 year ago

Current approach: learning about and testing SqueezeNext CNN to produce embeddings https://arxiv.org/pdf/1803.10615.pdf

SebChw commented 1 year ago

Find existing solution and test it

dkoterwa commented 1 year ago

Colab with existing solution: https://colab.research.google.com/drive/1WxR48YDRyyXs9RMfkQxGeSlJF-YFleyX?usp=sharing

SebChw commented 1 year ago

Now try to train it on dataset created by Michal Zawieja

dkoterwa commented 1 year ago

https://github.com/d-li14/mobilenetv3.pytorch

Found mobilenetv3 code, will try to implement that for our data