Closed dkoterwa closed 1 year ago
Current approach: learning about and testing SqueezeNext CNN to produce embeddings https://arxiv.org/pdf/1803.10615.pdf
Find existing solution and test it
Colab with existing solution: https://colab.research.google.com/drive/1WxR48YDRyyXs9RMfkQxGeSlJF-YFleyX?usp=sharing
Now try to train it on dataset created by Michal Zawieja
https://github.com/d-li14/mobilenetv3.pytorch
Found mobilenetv3 code, will try to implement that for our data
I think that this is the first step in order to build a good speaker recognition model. We have to produce quality embeddings inside Siamese Network, which will output high similarity for utterances from same user, and high dissimilarity for utterances of two different users.