Closed Gautam-Rajeev closed 8 months ago
Hi! Important Details - These following details are helpful for contributors to effectively identify and contribute to tickets.
Please update the ticket
Guys, Anyone of you can contribute. Let's not wait for the approval. We can start working and raise a PR whenever we want 🙌🏻
Hi all. Glad to see the enthusiasm here :) You don't have to ask permission to begin working on tickets. Please raise PRs and comment links to PRs here. I'll not be assigning anyone the ticket as such now
Hey team. Please raise a draft PR that we can review to see if everyone is going in the right direction. Thanks.
@ChakshuGautam I'm facing this issue while working in colab Environment DownloadError: ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. I have updated multiple times and tried with other version but it's still not working for me. while using yt-dlp for same ,it does perform well upto certain extent. should I continue with yt-dlp.
@kartikf4 Is this happening on non colab env as well? Any alternatives to this package that you tried out?
@kartikf4 Is this happening on non colab env as well? Any alternatives to this package that you tried out?
@ChakshuGautam well i didnt tried in local env but i did tried alternative yt-dlpcheck here
Probably has something to do with colab. Let's do locally.
@kartikf4 Is this happening on non colab env as well? Any alternatives to this package that you tried out?
@ChakshuGautam well i didnt tried in local env but i did tried alternative yt-dlpcheck here
Hey @kartikf4, This one is doing fine here.
Hi I want to contribute to this can you assign me
@ChakshuGautam https://pypi.org/project/youtube-transcript-api/ gives the transcripts for all videos in English/Hindi (from the auto generated cc). Can we clarify on the merits of extracting audio and transcribing separately apart from what is given using the above? Do we want to do that for Indian language videos ?
@ChakshuGautam ,@GautamR-Samagra on the further improvement on the issue
@xorsuyash can you share a draft PR anyway so that we can review in chunks?
@ChakshuGautam raised draft-pr
Hey @xorsuyash,
Let's drop vector and colbart part until the issue is resolved. Abhi ke liye we'll keep it simple
Single API: param - yt video link response - transcript.json
Also I have some questions:
@rachitavya
if any video has any audio related to language then youtube generates autogenerated transcript , only in those videos which does not have transcriptable audio like sound_track etc.. youtube does not generates transcript .
can you share some videos which have multiple languages so that i can test the api ? ( to check the extent at which youtube_transcript_api can transcript audio)
@xorsuyash Thanks for completing this.
cc: @Shruti3004 , @ChakshuGautam
Description
Be able to parse all the videos from a Youtube channel or Youtube playlist , extract transcripts from their audios and embed them in a vector DB to enable search/retrieve over it .
Implementation Details
It'll include the following :
Can use https://github.com/ytdl-org/youtube-dl for scraping Can use https://www.youtube.com/@3blue1brown as initial test set for the above Ticket for using ColBERT is covered here, you only need to make it work locally here using the notebook.
Product Name
AI Tools
Organization Name
SamagraX
Domain
NA
Tech Skills Needed
Pytorch/ Python, ML
Category
Feature
Mentor(s)
@GautamR-Samagra
Complexity
Medium