benearnthof / podcasty

MIT License
1 stars 0 forks source link

Add other platforms #3

Open benearnthof opened 1 year ago

benearnthof commented 1 year ago

Currently only youtube is supported, users should be able to transcribe from the most popular podcasting platforms.

Current Scope:

benearnthof commented 1 year ago

spotDL is useless for podcasts since it uses Youtube to obtain audio files.

But by using librespot https://github.com/kokarare1212/librespot-python/
we can cache spotify episodes and other audio tracks as .wav files from the raw byte stream.
This should be enough to transcribe episodes with whisper.
Will investigate further tomorrow.

Outline of workaround:

from librespot.core import Session
from librespot.metadata import TrackId
from librespot.audio.decoders import AudioQuality, VorbisOnlyAudioQuality

session = Session.Builder().user_pass("SPOTIFY_DEVICE_USER_ID", "SPOTIFY_DEVICE_PASSWORD").create()

aceess_token = session.tokens().get("playlist-read")
track_id = TrackId.from_uri("spotify:track:2JuasWPUodaUxf5nwNpciQ") # track ID
stream = session.content_feeder().load(track_id, VorbisOnlyAudioQuality(AudioQuality.VERY_HIGH), False, None)

binary_data = stream.input_stream.stream().read()

with open("tempfile.wav", mode="bx") as f:
    f.write(binary_data)

This outputs a .wav that can be processed by whisper.

benearnthof commented 1 year ago

Update: The functionality above has been implemented in
https://github.com/benearnthof/podcasty/commit/76ad9382f1d45ece70daacc538db464b8779a8dc
podcasty can now download tracks and podcast episodes in bulk from spotify when given a list of episode urls.

We will add functionality for further bulk processing (downloading all episodes of a podcast, or downloading from playlists) in the future, currently the focus is on extracting text data.

benearnthof commented 1 year ago

For future Reference: Soundcloud : https://github.com/Suyash458/soundcloud-dl/blob/dev/downloader/downloader.py

ApplePodcast: https://github.com/KaidiGuo/Apple_Podcast_Downloader/blob/master/apple_podcast_auto_download.ipynb

Export downloaded podcasts:
https://github.com/douglas-watson/podcasts_export

RSS Feeds:
https://dataskeptic.com/blog/podcasting/2016/download-all-podcast-episodes