Open benearnthof opened 1 year ago
spotDL is useless for podcasts since it uses Youtube to obtain audio files.
But by using librespot https://github.com/kokarare1212/librespot-python/
we can cache spotify episodes and other audio tracks as .wav files from the raw byte stream.
This should be enough to transcribe episodes with whisper.
Will investigate further tomorrow.
Outline of workaround:
from librespot.core import Session
from librespot.metadata import TrackId
from librespot.audio.decoders import AudioQuality, VorbisOnlyAudioQuality
session = Session.Builder().user_pass("SPOTIFY_DEVICE_USER_ID", "SPOTIFY_DEVICE_PASSWORD").create()
aceess_token = session.tokens().get("playlist-read")
track_id = TrackId.from_uri("spotify:track:2JuasWPUodaUxf5nwNpciQ") # track ID
stream = session.content_feeder().load(track_id, VorbisOnlyAudioQuality(AudioQuality.VERY_HIGH), False, None)
binary_data = stream.input_stream.stream().read()
with open("tempfile.wav", mode="bx") as f:
f.write(binary_data)
This outputs a .wav that can be processed by whisper.
Update: The functionality above has been implemented in
https://github.com/benearnthof/podcasty/commit/76ad9382f1d45ece70daacc538db464b8779a8dc
podcasty can now download tracks and podcast episodes in bulk from spotify when given a list of episode urls.
We will add functionality for further bulk processing (downloading all episodes of a podcast, or downloading from playlists) in the future, currently the focus is on extracting text data.
For future Reference: Soundcloud : https://github.com/Suyash458/soundcloud-dl/blob/dev/downloader/downloader.py
ApplePodcast: https://github.com/KaidiGuo/Apple_Podcast_Downloader/blob/master/apple_podcast_auto_download.ipynb
Export downloaded podcasts:
https://github.com/douglas-watson/podcasts_export
RSS Feeds:
https://dataskeptic.com/blog/podcasting/2016/download-all-podcast-episodes
Currently only youtube is supported, users should be able to transcribe from the most popular podcasting platforms.
Current Scope: