abhisheks008 / DL-Simplified

Deep Learning Simplified is an Open-source repository, containing beginner to advance level deep learning projects for the contributors, who are willing to start their journey in Deep Learning. Devfolio URL, https://devfolio.co/projects/deep-learning-simplified-f013
https://quine.sh/repo/abhisheks008-DL-Simplified-499023976
MIT License
389 stars 340 forks source link

Youtube Transcript Summarizer using NLP #940

Open sindhuja184 opened 1 month ago

sindhuja184 commented 1 month ago

Deep Learning Simplified Repository (Proposing new issue)

:red_circle: Project Title : Youtube Transcript Summarizer
:red_circle: Aim :The aim of the YouTube Transcript Summarizer is to provide concise, meaningful summaries by reducing transcript length by 80%, allowing users to quickly grasp the key points of a video.
:red_circle: Dataset : The dataset used would typically be the transcripts of YouTube videos
:red_circle: Approach : The YouTube Transcript Summarizer employs Natural Language Processing (NLP) techniques to provide concise summaries of video transcripts. The process begins with extracting the transcript, followed by preprocessing to clean and tokenize the text. The chosen algorithm then analyzes the content to generate a summary, significantly reducing the original length while retaining essential points. This approach enables users to quickly grasp the core message of a video without sifting through lengthy transcripts.(Transcripts are take with the help of youtube transcript summariser)


📍 Follow the Guidelines to Contribute in the Project :


:red_circle::yellow_circle: Points to Note :


:white_check_mark: To be Mentioned while taking the issue :


Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

github-actions[bot] commented 1 month ago

Thank you for creating this issue! We'll look into it as soon as possible. Your contributions are highly appreciated! 😊

Abhiiesante commented 1 month ago

Can you please assign this issue to me under 𝗚𝗦𝗦𝗼𝗖 '𝟮𝟰 𝗘𝘅𝘁𝗲𝗻𝗱𝗲𝗱, Hacktoberfest-accepted

abhisheks008 commented 1 month ago

Can you please assign this issue to me under 𝗚𝗦𝗦𝗼𝗖 '𝟮𝟰 𝗘𝘅𝘁𝗲𝗻𝗱𝗲𝗱, Hacktoberfest-accepted

As this issue is raised by @sindhuja184, this issue can't be assigned to you.

abhisheks008 commented 1 month ago

@sindhuja184 can you please elaborate the approach you are planning for this problem statement?

sindhuja184 commented 1 month ago

The aim of the project is to summarize the transcripts of the youtube video.

  1. Initially I would extract the transcript of the youtube video with the help of Youtube Transcript API.(Here I would need the video ID of the youtube video).
  2. Then split the text into chunks with each of size some tokens.(Summarization models have a token limit, so spliting is mandatory here.)
  3. Then by using Hugging face transformers I would summarize the text.(I would like to select facebook, bart-large-cnn model).
  4. Then, combine the summaries.

This is the approach I am planning to follow @abhisheks008

abhisheks008 commented 1 month ago

Apart from huggingface, any other algorithms you are comfortable with? As the project repository requires at least 3 model implementations for each problem statement.