Add bulk functionalities

benearnthof / podcasty

MIT License

1 stars 0 forks source link

Add bulk functionalities #4

Open benearnthof opened 1 year ago

benearnthof commented 1 year ago

Currently urls have to be passed one by one. Users should be able to pass in a list of urls or a full playlist and process the contents of the list in bulk.

Current scope:

Bulk processing of url lists in the form of a .txt
Bulk processing of playlists by their url

Youtube playlists should be trivial, Spotify playlists should work as long as they are public

benearnthof commented 1 year ago

Bulk retrieval of podcast episodes can now be done, users have to obtain the episode ids via the podcast search feature I'll push with the next commit, right now it is manual

benearnthof commented 1 year ago

for bulk transcription we should investigate the benefits of quantization to save on time and money https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html#evaluate-the-inference-accuracy-and-time

https://www.reddit.com/r/MachineLearning/comments/yeyxlo/p_openai_whisper_3x_cpu_inference_speedup/

https://huggingface.co/guillaumekln/faster-whisper-large-v2

https://github.com/MiscellaneousStuff/openai-whisper-cpu/issues/1#issuecomment-1293653424

https://github.com/openai/whisper/discussions/454

benearnthof commented 1 year ago

https://github.com/ELS-RD/kernl/blob/main/experimental/whisper/speedup.ipynb

https://www.reddit.com/r/MachineLearning/comments/10xp54e/p_get_2x_faster_transcriptions_with_openai/