Birdcall embeddings are currently trained from scratch using the methodology described in tile2vec i.e. modified resnet with mfcc input layer and triplet objective function that is trained from scratch. Time is limited -- it would be worth considering how to reuse existing CNN network parameters with the training procedure here.
We can use pre-trained models in pytorch-image-models adapted for triplet data loading and the new objective function, which would ideally allow us to train against deeper models.
Evaluation of the embedding will likely take place on the Kaggle leaderboard, but we can try seeing performance on tasks like call/no-call and ranking with soft-labels as described in birdnet.
Birdcall embeddings are currently trained from scratch using the methodology described in tile2vec i.e. modified resnet with mfcc input layer and triplet objective function that is trained from scratch. Time is limited -- it would be worth considering how to reuse existing CNN network parameters with the training procedure here.
We can use pre-trained models in pytorch-image-models adapted for triplet data loading and the new objective function, which would ideally allow us to train against deeper models.
Evaluation of the embedding will likely take place on the Kaggle leaderboard, but we can try seeing performance on tasks like call/no-call and ranking with soft-labels as described in birdnet.