BelonggAI / C4GTDMP

0 stars 4 forks source link

[DMP 2024]: AI-based Indian language corpus translation tool for BelonggAI - a platform for intersectional inclusion in development programs #1

Open BelonggAI opened 1 month ago

BelonggAI commented 1 month ago

Ticket Contents

Belongg is developing BelonggAI, a tool that will help development practitioners, researchers, funders, etc analyze their proposals, program documents, policy documents, etc to discover intersectional perspectives (gender, disability, sexual orientation, caste, religion, etc) that could get added to make the program more inclusive. The tool is based on a RAG architecture, with customized prompts, and a corpus running into thousands of research papers, media articles, grey literature. This is constantly growing as we expand our focus areas. However, all of this is still in English.

To avoid omitting the knowledge produced by and on marginalized and underserved communities, we are committed to building on the tool’s capability to process knowledge produced in languages other than English. To this end, Belongg would like to undertake a project to develop a tool to translate text documents, audio, and videos in Indian languages to PDFs with English text. The developed tool will be embedded in our existing LLM model that can only process English text. While working on this project, the intern selected as part of DMP will receive guidance from a Belongg mentor to coordinate with our technology team (which includes an LLM engineer and colleagues from ARTPark) and receive mentorship from a separate mentor assigned by Samagra to provide technical assistance to the mentee.

Goals & Mid-Point Milestone

Goals

Setup/Installation

No response

Expected Outcome

  1. User-Friendly Input Interface: Development of an access-controlled webpage, where Belongg team members can submit knowledge assets in various formats (text files, PDFs, audio files, video files) and languages. This interface should allow for the submission of URLs for media content or direct uploads of the files.
  2. Batch Processing and Metadata Management: The interface must support batch uploads, enabling users to add multiple knowledge assets simultaneously. Each file should have an option to include metadata (e.g., title, source link, tags related to content type). The Belongg team adds hundreds of knowledge objects to the corpus each week. This feature will ensure organized and efficient handling of large volumes of data.
  3. Translation and Conversion to English Text: All uploaded knowledge assets will be automatically processed to convert and translate the content into English text, maintaining the integrity and context of the original materials.
  4. Integration with Google Drive and Sheets: The translated and converted content, along with its metadata, will be systematically stored in a designated Google Drive folder. A Google Spreadsheet will be programmatically updated with the status of each submission, including links to the processed files in Drive, ensuring efficient tracking and management of the knowledge assets.

Acceptance Criteria

  1. Functionality of the Input Interface: The webpage must be secure, user-friendly, and capable of handling multiple file uploads with associated metadata. Only authorized Belongg team members should access this portal.
  2. Accuracy and Reliability of Translation: The system must deliver high-quality translations with a predefined accuracy threshold (e.g., 95% accuracy), ensuring the content is contextually and culturally accurate in English.
  3. Efficient Batch Processing: The tool should handle batch uploads seamlessly, with each file's metadata accurately captured and associated with the corresponding translated content in the output.
  4. Seamless Integration and Data Management: Successful integration with Google Drive for storing translated files and Google Sheets for real-time status updates. The system should maintain a high level of organization, allowing easy retrieval and tracking of processed knowledge assets.

Implementation Details

  1. Webpage Development: The webpage will be developed using web development technologies with backend support for user authentication (e.g., OAuth for username/password login). This ensures a secure and accessible platform for file submission.
  2. Translation and Text Conversion Technology: Utilizing advanced AI and machine learning technologies for language translation and speech-to-text conversion. This includes leveraging open-source libraries and possibly integrating with third-party APIs to support a wide range of languages and content formats.
  3. Batch Processing and Metadata Handling: Implementation of a backend system capable of processing multiple uploads simultaneously, extracting metadata for each file, and ensuring each piece of content is appropriately tagged and stored.
  4. Google Drive and Sheets API Integration: Using Google Drive API for storing translated documents and Google Sheets API for updating the tracking spreadsheet. This requires careful planning to ensure data consistency, access management, and real-time update capabilities.
  5. Continuous Monitoring and Feedback Loop: Setting up mechanisms for monitoring the system's performance, gathering user feedback, and making iterative improvements to enhance functionality, user experience, and translation accuracy.

Mockups/Wireframes

No response

Product Name

BelonggAI

Organisation Name

Belongg

Domain

Social development research

Tech Skills Needed

AI, Database, html, Javascript

Mentor(s)

@nbelongg

Category

API, Database, Machine Learning, Backend, Frontend

glitcher007 commented 1 month ago

Hi @BelonggAI ,

I hope this message finds you well. I wanted to express my keen interest in contributing to the development of the multilingual translation tool project for BelonggAI. With my background in [mention your relevant skills, e.g., natural language processing, machine learning, programming languages], I am confident in my ability to make meaningful contributions to this initiative.

I am excited about the prospect of leveraging my skills to help Belongg expand its reach and ensure inclusivity in processing knowledge produced in Indian languages. I am eager to collaborate with the team and contribute to the success of this project.

Looking forward to the opportunity to work together and make a positive impact.

Sayanjones commented 1 month ago

Hey @BelonggAI I'm interested in contributing to the BelonggAI multilingual translation tool. I have the listed skills (AI, Database, HTML, Javascript) and experience with RAG architecture and LLMs, which could be beneficial.

My skills can help with: Exploring advanced translation models Designing a system for managing knowledge assets Developing the user interface

I'd love to discuss how I can contribute further. Could we schedule a meeting?

AkanshuAich commented 1 month ago

Hii @BelonggAI ,

I am Akanshu Aich, a third year student from International Institute of Information Technology, Bhubaneswar. I am writing to express my interest in contributing to this project as a part of DMP 2024. Having thoroughly reviewed the project, I am impressed by its objectives and it seeks the potential for great impact in industries.

With my background in Backend using Django , MERN with practicing hands on Machine learning and DevOps such as Docker, I believe I can make valuable contributions to AI, database and web part . My experience includes several projects like Society-Expenditure Manager using Django, Real Estate using MERN and Info-Finding Tool using Machine Learning(LLM), which I believe align well with the goals of your project.

I am particularly interested in fulfilling the requirements of the project and have some ideas on how to approach it effectively. I am committed to adhering to best practices, contributing high-quality code, and actively collaborating with the project maintainers and community.

I am excited about the opportunity to contribute to "BelonggAI" and help further its mission. I look forward to discussing potential contributions and how I can best support the project.

Please guide me with procedure and with all your knowledge and experience.

kartikeshwar156 commented 1 month ago

Hello @BelonggAI,

Myself Kartikeshwar and I am reaching out to express my strong interest in participating in the development of the multilingual translation tool project for BelonggAI. With my skills in natural language processing, machine learning I am confident in my ability to provide valuable input to this endeavor.

I'm thrilled about the opportunity to utilize my skills to assist Belongg in expanding its outreach and ensuring inclusivity in handling knowledge generated in various Indian languages.

I eagerly anticipate the chance to work together and contribute positively to the project.

kannanb2745 commented 1 month ago

Hi @BelonggAI,

I'm KANNAN B, a second-year B.Tech student at Veltech Hightech Engineering College. I'm eager to contribute to the development of the multilingual translation tool for BelonggAI. With experience in building voice response AI and full-stack web development, I'm excited to help create user-friendly input interfaces, streamline batch processing, and ensure accurate translations. I'm committed to meeting project goals and making a meaningful impact. Looking forward to collaborating with your team!

Best regards, KANNAN B

AkanshuAich commented 1 month ago

Briefing about the tech stack we may use for the project:

  1. NLP libraries such as NLTK (Natural Language Toolkit), spaCy, or Hugging Face's Transformers for text processing and translation tasks.
  2. Leverage pre-trained machine translation models such as Google Translate API, MarianMT, or OpenNMT for translating text between Indian languages and English.
  3. For audio and video translation, use STT libraries like Google Cloud Speech-to-Text or Mozilla DeepSpeech to transcribe spoken content into text.
  4. Use libraries like ReportLab or PyPDF2 in Python for generating PDF documents from translated text.
SAHITYA621 commented 1 month ago

Hi @BelonggAI,

I trust this message finds you well. I'm reaching out to express my strong interest in contributing to the development of the multilingual translation tool project for BelonggAI. With my background in natural language processing, machine learning, and proficiency in programming languages like Python and Java, I am confident in my ability to offer valuable insights and contributions to this initiative.

Having previously worked on projects involving AI, Database, html, Javascript , I am particularly excited about the opportunity to leverage my skills to assist Belongg in expanding its reach and ensuring inclusivity in processing knowledge produced in Indian languages.

I am eager to collaborate with the team and play a role in the success of this project. Thank you for considering my application, and I look forward to the possibility of working together to make a positive impact.

Best regards, SAHITYA GAUR

depikaguptaa commented 1 month ago

Hi @BelonggAI,

I am highly interested in contributing to the development of the multilingual translation tool project for BelonggAI . I have strong skills in ML, NLP, data scraping, LLMs and RAG models. I have worked with LLMs and RAG in my personal projects. This project aligns with my interests, and it would be great if it is assigned to me.

Please assist me in becoming familiar with the project and how to start contributing to it.

Regards, Depika Gupta

AbhimanyuSamagra commented 3 weeks ago

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project.

Riyasharma28 commented 3 weeks ago

Good evening Respected Mentor @BelonggAI

I'm a Full Stack Developer proficient in HTML, CSS, JavaScript, MongoDB, React.js, Node.js, Bootstrap, Database Management, UI/UX Design. With over 10 projects completed in similar domains, I'm eager to contribute my expertise to the project. I'm excited about the opportunity to collaborate and drive its success. Looking forward to discussing further.

Poorvansha commented 3 weeks ago

A very good morning @BelonggAI I wanted to express my interest in contributing in this interesting project.I am an engineering student and am still finding my ways of learning different frameworks.I may not know each and everything required in contributing in this project but I can assure I am a keen learner and am looking forward to learn and produce a product that will help solve others in solving various problems. It would be a great opportunity if I get to collaborate in this project.

Suni17sunny37 commented 3 weeks ago

Hi @BelonggAI I'm interested in collaborating on Belongg's Multilingual Translation Tool Development Project. The initiative to expand the tool's capability to process knowledge produced in languages other than English is commendable. With my expertise in AI, database management, HTML, and JavaScript, I believe I can contribute effectively to the project's goals. I look forward to the opportunity to work alongside your team and receive guidance from @nbelongg.

KHUSHIPACHAURI commented 3 weeks ago

Hi @BelonggAI Sir, I'm impressed by this project on GitHub, especially the blend of AI, database, HTML, and JavaScript. With expertise in these areas, I'm eager to contribute. Let me know if there's room for collaboration. i believe I can contribute significantly to the development and enhancement of your project. Whether it's optimizing AI algorithms, fine-tuning database structures for better performance, or improving the user experience through frontend enhancements, I'm ready to roll up my sleeves and get to work. Thanks, Khushi Pachauri

DGRYZER commented 3 weeks ago

Hello, My name is Debajyoti Ghosh. I am a Frontend Developer (Fresher). I am sharing some features that this should have. These features are following -

  1. Project Management and Coordination:
    • Appoint a project manager for oversight and coordination.
    • Maintain regular communication channels.
  2. Technical Infrastructure Setup:
    • Establish secure authentication mechanisms.
    • Set up required technical infrastructure.
  3. Language Translation and Text Conversion:
    • Choose AI and machine learning tech for translation.
    • Explore open-source libraries and third-party APIs.
  4. Batch Processing and Metadata Management:
    • Develop a robust backend system.
    • Implement efficient algorithms for batch processing.
  5. Integration with Google Drive and Sheets:
    • Integrate with Google Drive and Sheets APIs.
    • Ensure seamless data transfer and synchronization.
  6. Quality Assurance and Performance Metrics:
    • Define quality standards and accuracy thresholds.
    • Establish performance metrics for evaluation.
  7. Documentation, Training, and Support:
    • Create comprehensive user documentation.
    • Provide ongoing technical support and training.
  8. Continuous Monitoring and Improvement:
    • Implement monitoring mechanisms for performance.
    • Establish a feedback loop for iterative enhancements.

Thank You. Debajyoti Ghosh.

gjyotk commented 3 weeks ago

Hello @BelonggAI

I am Gurjot Kaur, a second-year student doing my Bachelor in Engg in CSE with Specialisation in Artificial Intelligence and Machine Learning (AIML). I wish to express my keen interest in contributing to your project. I believe this project is a great initiative to increase the inclusion of Local-Indian language texts and documents in AI and therefore wish to be a part of this cause.

One of my recent projects included working with data scraping tools and libraries where I also worked on fine-tuning Google's Flan T5-base Large Language Model (LLM) for my recommendation system. Other than that, I have experience working with Natural Language Processing, audio and video processing and object detection.

I am confident that if given the opportunity, I can help make significant contributions to this project and learn new things from your team of professionals. Therefore, I am submitting my project proposal for this project for DMP 2024.

Excited to collaborate and be a part of this team.

Regards, Gurjot Kaur

Ishan-53 commented 2 weeks ago

Hi @BelonggAI ,

I hope this message finds you well. I wanted to express my keen interest in contributing to the development of the multilingual translation tool project for @BelonggAI. I am confident in my ability to make meaningful contributions to this initiative.

I am excited about the prospect of leveraging my skills to help Belongg expand its reach and ensure inclusivity in processing knowledge produced in Indian languages. I am eager to collaborate with the team and contribute to the success of this project. Looking forward to the opportunity to work together and make a positive impact.

Pranaytelagathoti commented 2 weeks ago

Hi @BelonggAI ,

I hope this message finds you well. I wanted to express my keen interest in contributing to the development of the multilingual translation tool project for @BelonggAI.Because i have hands on experience with the transformers and different model. As the project has huge corpus the RNN model isn't suitable so i can deal with all this , With my background in [mention your relevant skills, e.g., natural language processing, machine learning, programming languages], I am confident in my ability to make meaningful contributions to this initiative. And i have been working on the similar project since a month i have a great exposure on this project

I am excited about the prospect of leveraging my skills to help Belongg expand its reach and ensure inclusivity in processing knowledge produced in Indian languages. I am eager to collaborate with the team and contribute to the success of this project.

Atharva1723 commented 2 weeks ago

Hey @BelonggAI I'm interested in contributing to the BelonggAI multilingual translation tool. I have the listed skills (AI, Database, HTML, Javascript) and experience with RAG architecture and LLMs, which could be beneficial.

My skills can help with: Exploring advanced translation models Designing a system for managing knowledge assets Developing the user interface

I'd love to discuss how I can contribute further. I have also submitted an proposal application for this project

goutham4126 commented 2 weeks ago

Hello @BelonggAI team. I hope this message finds you well. I am writing to express my keen interest in contributing to the development of the BelonggAI multilingual translation tool. Having reviewed the outlined requirements and objectives of the project, I believe my skill set aligns well with the needs and vision of BelonggAI.

With proficiency in AI technologies, database management, HTML, and JavaScript, coupled with extensive experience in working with LLMs, UI/UX designs,wireframing, frameworks such as Next.js, I am confident in my ability to make meaningful contributions to the project.

I am particularly enthusiastic about the opportunity to support BelonggAI in its mission to facilitate inclusivity and accessibility in the dissemination of knowledge across various Indian languages. I am eager to leverage my expertise to enhance the user experience and functionality of the translation tool, ensuring it meets the diverse linguistic needs of users.

I have also taken the initiative to submit a proposal application outlining my ideas and potential contributions to the project. I am open to further discussions and collaboration to explore how I can best support BelonggAI in achieving its objectives.

worrier1728 commented 2 weeks ago

Hi @BelonggAI ,

I hope this message finds you well. I wanted to express my keen interest in contributing to the development of the multilingual translation tool project for BelonggAI. With my background in natural language processing, machine learning, programming languages, I am confident in my ability to make meaningful contributions to this initiative.

I had already worked an AI Model Trainer at Remotasks, Scale AI and Zee Dimensions and I am excited about the prospect of leveraging my skills to help Belongg expand its reach and ensure inclusivity in processing knowledge produced in Indian languages. I am eager to collaborate with the team and contribute to the success of this project.

Looking forward to the opportunity to work together and make a positive impact.