PlanetRead / PR-Repository

5 stars 9 forks source link

[DMP 2024]: Auto Subtitler for Indian Languages #2

Open arvind-planetread opened 5 months ago

arvind-planetread commented 5 months ago

Ticket Contents

As part of the BIRD initiative , we aim to create a tool which can speed up the adoption of Same Language Subtitling (SLS) among the content producers for the entire country. This will ensure that 200M weak readers and 30M readers with accessibility to get regular reading exposure with content having SLS.

This tool will create SRT files by taking a video file and its text file. We aim for the tool to support the following languages : Tamil, Telugu, Kannada for now.

Goals & Mid-Point Milestone

Goal 1: Achieve 60% accuracy in timing accuracy of SRT files in Tamil Language. Achieve 60% accuracy in timing accuracy of SRT files in Telugu Language. Achieve 60% accuracy in timing accuracy of SRT files in Kannada Language.

Goal 2: Achieve 70% accuracy in timing accuracy of SRT files in Tamil Language. Achieve 70% accuracy in timing accuracy of SRT files in Telugu Language. Achieve 70% accuracy in timing accuracy of SRT files in Kannada Language.

Goal 3: Achieve 80% accuracy in timing accuracy of SRT files in Tamil Language. Achieve 80% accuracy in timing accuracy of SRT files in Telugu Language. Achieve 80% accuracy in timing accuracy of SRT files in Kannada Language.

Goal 4: Achieve 90% accuracy in timing accuracy of SRT files in Tamil Language. Achieve 90% accuracy in timing accuracy of SRT files in Telugu Language. Achieve 90% accuracy in timing accuracy of SRT files in Kannada Language.

The midpoint milestones will be completion of Goal 1 and Goal 2.

Setup/Installation

No response

Expected Outcome

The input will be a video file and its script in text file format. The text will be utf8 encoding. The output will be an SRT file with timecode for each line of the script.

Acceptance Criteria

We will use the VLC media player to check the time accuracy of the generated SRT file. This will be used to verify the completion of the goals too. We will use multiple video files to check if the tool is versatile.

Implementation Details

Python or any other technical stack.

Mockups/Wireframes

No response

Product Name

Auto Subtitler for Indian Languages

Organisation Name

Planet Read

Domain

⁠Education

Tech Skills Needed

Machine Learning, Python

Mentor(s)

@arvind-planetread

Category

Accessibility, Machine Learning

krishnarathore12 commented 3 months ago

Hello @arvind-planetread I am Krishna Rathore undergraduate student at IIT Patna. I have a deep passion for AI and also recent advancements in NLP make me wonder about the future of AI. Here are my achievements.

I have used tiny whisper model to transcribe the audio in this YouTube video https://www.youtube.com/watch?v=BaZrJjR8e0g and created subtitles in .srt format image

I would love to work on this project more, thanks for reading my message. Warm regards Krishna Rathore

arvind-planetread commented 3 months ago

@krishnarathore12 thanks for the mini POC. I will review other proposals and their POCs along with yours to decide the final candidate. 👍

Abinash-bit commented 1 month ago

Weekly Goals

Week 1