adrianco / meGPT

Apache License 2.0
225 stars 23 forks source link

Process YouTube playlist for ingestion #1

Open adrianco opened 3 months ago

adrianco commented 3 months ago

Youtube has transcripts but they aren't very good and it's not possible to download them from youtube's API unless you uploaded the video yourself. ChatGPT was used to build some code to do this but the pytube library comes with a command line tool that downloads a whole playlist to a directory. Then Whisper can be used to generate a transcript. Ideally, the author voice would be recognized and labeled in the transcript, for cases where the video is of an interview or there are multiple speakers.

ksmotiv8 commented 3 months ago

Do you know if Descript has an API? It does a pretty good job at the analysis already… but I would not want to manually go through each video to generate it :).

On Aug 3, 2024, at 9:57 AM, Adrian Cockcroft @.***> wrote:

Youtube has transcripts but they aren't very good and it's not possible to download them from youtube's API unless you uploaded the video yourself. ChatGPT was used to build some code to do this but the pytube library comes with a command line tool that downloads a whole playlist to a directory. Then Whisper can be used to generate a transcript. Ideally, the author voice would be recognized and labeled in the transcript, for cases where the video is of an interview or there are multiple speakers.

— Reply to this email directly, view it on GitHub https://github.com/adrianco/meGPT/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5U2PPRQZKTBST62E7TOADZPUDYHAVCNFSM6AAAAABL6BW2AKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2DMNJQGAZTMMI. You are receiving this because you are subscribed to this thread.

ksmotiv8 commented 3 months ago

https://docs.descriptapi.com

adrianco commented 3 months ago

Thanks for the input, https://www.descript.com looks really powerful, I didn't know about it.

adrianco commented 3 months ago

5

adrianco commented 3 months ago

I had a very long ChatGPT session where I eventually discovered that you can only access the provided transcript with an authenticated API call for your own videos. I've abandoned this approach but here it is for reference. https://chatgpt.com/share/21e3b3af-bd97-409c-9938-f3f57298383f

ksmotiv8 commented 3 months ago

There are other scripts that let you screen scrape the video - and then you can run Descript on those. Ironically, you may not have copyright on your own conversation :).

On Aug 4, 2024, at 3:29 PM, Adrian Cockcroft @.***> wrote:

I had a very long ChatGPT session where I eventually discovered that you can only access the provided transcript with an authenticated API call for your own videos. I've abandoned this approach but here it is for reference. https://chatgpt.com/share/21e3b3af-bd97-409c-9938-f3f57298383f

— Reply to this email directly, view it on GitHub https://github.com/adrianco/meGPT/issues/1#issuecomment-2267808758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5U2PLCT4HVWFWZCGMHGUTZP2TMBAVCNFSM6AAAAABL6BW2AKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXHAYDQNZVHA. You are receiving this because you commented.