Open benearnthof opened 1 year ago
Bulk retrieval of podcast episodes can now be done, users have to obtain the episode ids via the podcast search feature I'll push with the next commit, right now it is manual
for bulk transcription we should investigate the benefits of quantization to save on time and money https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html#evaluate-the-inference-accuracy-and-time
https://www.reddit.com/r/MachineLearning/comments/yeyxlo/p_openai_whisper_3x_cpu_inference_speedup/
https://huggingface.co/guillaumekln/faster-whisper-large-v2
https://github.com/MiscellaneousStuff/openai-whisper-cpu/issues/1#issuecomment-1293653424
Currently urls have to be passed one by one. Users should be able to pass in a list of urls or a full playlist and process the contents of the list in bulk.
Current scope:
Youtube playlists should be trivial, Spotify playlists should work as long as they are public