filecoin-project / devgrants

👟 Apply for a Filecoin devgrant. Help build the Filecoin ecosystem!
371 stars 308 forks source link


Closed MananAl closed 1 year ago

MananAl commented 1 year ago

Open Grant Proposal: BABYLON VOICE


Proposal Category: app-dev

Proposer: MananAl

(Optional) Technical Sponsor: Piyush Maheshwari, Anshuman Prasad (Filecoin, Protocol Labs)

Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: Yes

Project Description

Chaotic and uncontrollable growth of metaverse and web3 platforms, lack of reliable ID mechanisms, and more than a billion users having to rely on web2 identification methods when accessing web3 infrastructures: Identity theft, phishing, hacking, spam messaging, and fraud. LACK OF RELIABLE ID, AND CONTENT OWNERSHIP CRITERIA IN WEB3. Also, there isn't technology for advanced management of Dynamic NFTs, and creating hierarchical structure inside multimedia content on web3. We are going to solve the lifecycle problem for Metaverse. An in-app multimedia data on chain’ SDK. Data-to-earn: Think greeting cards, written content, 4d stuff. What can you create with your phone that extends beyond just a jpeg?

The solution is a Multimedia web3 storage based on AI audio and video solution, and B2B sales market fit is an SDK for Metaverse Avatars across Mona, Decentraland, Spatial, etc.

The main niche is a combination of AI plus Web3, where AI is used for creating GAN (generation) audio and video as well share, and trading this data as NFT (stealth, compossible). NFT can be VoicePrint (7-sec users private data as m4a), and Multimedia NFT, Voice Skin. It will be going to Metaverse (Mona based on Polygon and portfolio company baked by Protocol Labs Venture Capital) and adding these voices to avatars in Metaverse, Web2 social media, DApps, and Wallets. Further, as Mona Metaverse is building the backpack solution for Avatar, these SDKs can be embedded there.

Babylon Voice working for VoicePrint, Clone Voice tech on the go, voice, video, and streaming. So can be onboard on / for all storage solutions (private and public). However, SDK structure through which anyone can use API to use the voice as a Decentralized ID, Voice skin for communication, text to speech, and can be a new revenue stream, kind of pricing model for all. Which further can be done for videos as well.

Bundling the APIs into the Babylon Voice SDK for mp4, m4a, mov, wav, mp3 voice/voice model data (spectrogram, hash 256) also probably makes sense when white-labeling the product for other services.


SDK and an app that will allow users to upload their voice, and videos easily to Filecoin. The target audience is creators and web3 users at Metaverse. Upload & download functionality with Filecoin will be supported by Metamask/FilSnap leveraging Ceramic to upload via Powergate.

This simple web app will solve many problems, including:

From the modification logic side, we are going to create a standard for storing the implementation of modification algorithms (not only ML) and their artifacts on IPFS. This provides one more way in which decentralized calculation could be implemented. In addition to the previous, we have to solve the problem with delegated file access control. By making Filecoin development more accessible, the entire ecosystem can expand. If we get this right, we will be providing a tool that will bring Filecoin development exposure to a much broader audience developers need tools that will help them interface with Filecoin more easily. The intention here is to make development more manageable, and when you do that, more developers can participate in the community. A vibrant community benefits everyone.


The application will be written in NodeJS but all UI and core functionality will be implemented in HTML and plain JavaScript to ensure as wide a range of open-source contribution potential as possible.

Development Roadmap

2 weeks | 25000$ 5 developers (1 frontend, 1 data scientist, 1 python-dev, 1 solidity-dev, 1 QA) + infrastructure costs

2 weeks | 25000$ 5 developers (1 frontend, 1 data scientist, AI, 1 python-dev, 1 solidity-dev, 1 QA) + infrastructure costs

2 weeks | 20000$ 3 developers (1 python-dev, 1 Solana dev, 1 QA) + infrastructure costs

Total | 70000$

Maintenance and Upgrade Plans

Maintenance for 1 year guaranteed, including critical bug fixes on demand. We are always open to community suggestions to improve the project. The next step is to onboard data scientists and

1. MonaVerse, ReadyPlayer Me (we are negotiating with a few Metaverse, and Web3 dApps)

2. Collaborating with NFT-market places (we already have agreements with RARIBLE)

3. Collaborating with GameFi

4. Support SDK and make improvements


Team Members

Team Member LinkedIn Profiles

- Max Staples @doevent AI/ML, data scientists, AI bots, Intelligent Automation, and experienced in creating STT, TTS, NLP, and bots (with AI magic under the hood). Deep learning skills, especially sequence and generative models (Attention, Transformers, GANs, Diffusion, etc.) Research or industry experience with speech technology, preferably TTS but also ASR, Voice Conversion, Speaker Diarization, etc. Experienced with Python and PyTorch. NLP and language processing skills. Published relevant articles (e.g. Interspeech, ICASSP, Speech Communication, NeurIPS, CVPR). 5 years of AI experience manipulating data sets and building statistical models, has a Master’s Computer Science, and is familiar with the following software/tools/languages: C, C++, Java, R, Python, SLQ, JavaScript, Go, Rust, Kotlin, Julia and data mining techniques: GLM/Regression, Random Forest, Boosting, Trees, Map/Reduce, Hadoop, Hive, Spark, Gurobi, MySQL, text mining, social network analysis. AWS and Docker knowledge (working with EC2, S3, CloudWatch, deploying Containers, writing Dockerfiles).Experience using statistical computer languages (R, Python, SLQ) to manipulate data and draw insights from large data sets, Plus experience working with and creating data architectures. Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks) and their real-world advantages/drawbacks. Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests, and proper usage) and experience with applications. Experience using web services: Redshift, S3, Spark, DigitalOcean, and creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modeling, clustering, decision trees, neural networks, and analyzing data from 3rd party providers: Google Analytics, Site Catalyst, Coremetrics, Adwords, Crimson Hexagon, Facebook Insights. AI NFT collection: Summary Audiogram AI bot Bot removes the background from a picture Bot enhances photo, face: Restoration/Scale/UpScale/Colorizer Bot converts photos and video notes into anime, arcane, vintage, text to music. Multilingual GPT-3 model: model supports 60 languages Prompt Generator: Stable Diffusion Prompt Generator: Diffusion [Disable NSFW Filter] Deep Clone Voice

- David Tanaka @realhardworkeringdeveloper Blockchain Engineering, Solidity & Rust Development, Full Stack Senior Fullstack web developer with 7+ years of experience in Japan. Familiar with Ethereum, Polygon, BSC, Avalanche for Solidity and Solana, Elrond, Polkadot/Substrate, Cosmos, and Near for Rust.

- Vasily Tabunov @vtabunov Mobile React-Native developer (JavaScript+Redux) React engineer with 8+ years of experience, to simultaneously target both iOS and Cross-Platforms, who can create well-structured front-end architecture, APIs, and can also write reusable, and scalable JavaScript codes. Strong knowledge of HTML and CSS In-depth knowledge of React.js and its fundamentals, and knowledge of UI/UX designs and wireframes, Plus a hands-on experience with React tools like Webpack, Enzyme, React.js, Flux, and Redux, JavaScript libraries such as Redux to make asynchronous API calls as well as improve the performance of the websites/mobile apps. And transition existing React apps to React Native, aspectual for multimedia and content-AI-based apps.

- Donald King Project management currently working for PwC's Oracle team as a Technology Consultant (Digital Assets and Metaverse), where I bring fortune 500 companies into the Oracle cloud. Within Web3.0 and the Metaverse have experience designing and building virtual experiences in The Sandbox and Decentraland.

Team Website

Additional Information

Last Deck

  1. It would be very helpful to have a technical liaison from Filecoin who is available for brief check-ins throughout the grant execution. It’s valuable to have an available contact who can confirm that we’re on the right track, and producing results that are both representative and useful.
  2. We are always happy to iterate the proposal based on your priorities and advice. Let us know what you'd like to see modified!

Thanks Founders @ Babylon Voice by Manan AI

ErinOCon commented 1 year ago

Hi @MananAl, I know it has been a considerable amount of time since we have provided an update for this proposal. In light of the macroeconomic climate, the review of our budget and priorities for the fiscal year has resulted in a longer evaluation period. Thank you for all of your patience.

Unfortunately, we will not be moving forward with a grant at this time, but you may have an interest in checking out an accelerator program for this project! Wishing you all the best as you continue to build!