earthspecies / audio-embeddings

7 stars 2 forks source link

quick update - just pushed our first training notebook to the repository #2

Closed radekosmulski closed 3 years ago

radekosmulski commented 3 years ago

I just pushed 02_basic_model_with_teacher_forcing.ipynb to the repository 🙂. It contains and end to end training pipeline, from reading in the data (as processed in 01_a_first_look_at_librispeech.ipynb) to extracting embeddings from trained model.

There is only one thing about the model that is slightly disturbing - that it doesn't learn anything useful (please check the bottom of the notebook for an example)! But at this point I find this a slight inconvenience - the exciting part is that we have the entire pipeline ready!

So what are the next steps? We can now experiment with new methods of training via modifying the Model().

For starters, I am planning to drop the teacher forcing and go to a manual implementation of the RNN loop. I also realize that we are padding by zeros quite extensively, which wastes computation. But I am trying to stay as faithful to the implementation from the paper as I can. Also, the idea is to keep things simple for now.

What would make sense doing though, and I am hoping to get around to that, is understanding whether these longer utterances are not malformed data. Will include this investigation in the next iteration of work on this.

bs commented 3 years ago

Any next steps before closing this?

radekosmulski commented 3 years ago

we're good to go here I believe, closing