KBVE / kbve.com

https://kbve.com
7 stars 12 forks source link

[Concept] : Seamless M4T - 24 Hour Hackathon #800

Closed h0lybyte closed 10 months ago

h0lybyte commented 10 months ago

Core Concept/Theory A clear and concise description of what the concept is. Ex. It would be cool if [...]

Idea

Utilize SeamlessM4T for building out language functionality for ATLAS/HQPlan

-OR-

Seamless M4T Widget/Discord Bot/ Chrome Extension

[Concept] : Seamless M4T - 24 Hour Hackathon

Alternative Ideas Is there any other way this concept could be used?

Translate business communication across international clients and vendors, using Wav2Lip can be used to simulate realistic visually-augmented translations; using voice cloning AI we could potentially turn it into an API (?)


Alternative Examples/Sources Are there any other references that you can provide? A buddy of mine made this for post-video processing and it went viral but Seamless M4T didn't come out yet. Here's a link to his API in action:

https://twitter.com/therealprady/status/1680645510103977987?s=20


Additional information Add any other context or examples of this concept here.

Main focus is to turn a project/presentation in. Once we lock in an idea, @8gratitude8 will set up the slides ahead of time

h0lybyte commented 10 months ago

If we go the widget route, we could something similar to these

Where we could add them anywhere by calling the javascript file and using div tags

What about a chrome extension/ simple API set up?

h0lybyte commented 10 months ago

Interesting video concept -> https://www.youtube.com/watch?v=iTZ2N-HJbwA

But in 24Hs is a bit rough, however we could implement that into the "future applications" concept.

Video Concept Visualization:

A buddy of mine made this for post-video processing and it went viral but Seamless M4T didn't come out yet. Here's a link to his API in action:

https://twitter.com/therealprady/status/1680645510103977987?s=20

8gratitude8 commented 10 months ago

Resources for Seamless M4T:

Seamless M4T : https://github.com/facebookresearch/seamless_communication

Docker File: https://github.com/djyde/seamlessM4T-docker

Google Colab: https://github.com/camenduru/seamless-m4t-colab

ONNX Fast SeamlessM4T: https://github.com/fabio-sim/Fast-SeamlessM4T-ONNX

ONNX (Open Neural Network Exchange) : https://github.com/onnx/onnx

ONNX2RKNN Dockerfile : https://github.com/how2flow/docker-onnx2rknn

Hugginface/Colab: https://github.com/iakashpaul/seamlessly

Streamlit Demo: https://github.com/carolinedlu/seamlessm4t-streamlit

8gratitude8 commented 10 months ago

ATLAS/HQ Plan Frontend UI Starter:

https://github.com/5-Dee-Studios/hqplanUI

8gratitude8 commented 10 months ago

High Level Design for SeamlessM4T Integration into ATLAS/HQPlan

High-Level Design and Pseudocode for Hybrid Appwrite and n8n Approach

Setup Appwrite and n8n

  1. Install Appwrite and n8n in the environment.
  2. Configure basic settings.

Design High-Level Architecture

Implement Appwrite Functions

Text Translation Function


async function textTranslation(data) {
  // Implement translation logic using SeamlessM4T
  return translatedText;
}

AUDIO TRANSLATION FUNCTION:

async function audioTranslation(data) {
  // Implement translation logic using SeamlessM4T
  return translatedAudio;
}

Implement n8n Workflow
Workflow Nodes
* Webhook Node: To trigger the workflow.
* Function Nodes: For parts of SeamlessM4T logic.
* Set Node: To set data.
* HTTP Request Node: To call Appwrite functions.

i.e.

{
  "nodes": [
    {
      "type": "webhook",
      // Configuration
    },
    {
      "type": "function",
      // Configuration
    },
    {
      "type": "set",
      // Configuration
    },
    {
      "type": "httpRequest",
      // Configuration
    }
  ]
}

Integrate with HQPlanUI
* Use Appwrite SDK and n8n Webhook URL to integrate with 'hqplanUI-master'.

In ‘hqplanUI-master’
// Call Appwrite function for text translation
const translatedText = await appwriteSDK.callFunction('textTranslation', data);

// Trigger n8n workflow for audio translation
const translatedAudio = await fetch(n8nWebhookURL, {
  method: 'POST',
  body: JSON.stringify(data)
});

IMPLEMENTATION

Setup Appwrite and n8n

* Appwrite: Follow the [official documentation](https://appwrite.io/docs/getting-started) to set up Appwrite on your server.
* n8n: You can quickly get n8n running using its [Docker image](https://docs.n8n.io/getting-started/installation.html).

Design High-Level Architecture
* Use a drawing tool or whiteboard session to sketch the flow between Appwrite, n8n, and your main application.

Implement Appwrite Functions
* Go to your Appwrite console, navigate to the Functions section, and create new functions for text and audio translation.
* Deploy these functions using the Appwrite SDK or CLI.
typescript

Copy code
// Appwrite Function to translate text const translateText = async (text, sourceLang, targetLang) => { // Use SeamlessM4T API or library here return translatedText; };

Implement n8n Workflow
* Open the n8n interface and create a new workflow.
* Add a Webhook node to trigger the workflow.
* Add Function nodes to handle specific logic.
json

Copy code
// n8n Workflow JSON { "nodes": [ // Webhook and Function nodes here ] }

Integrate Agent Protocol
* Within Appwrite functions and n8n Function nodes, incorporate the Agent Protocol for secure data transfer.

Integrate with HQPlanUI
* Use the Appwrite JavaScript SDK and n8n webhook URLs to make function calls.
typescript

Copy code
// Inside your React component in hqplanUI-master const translatedText = await appwrite.callFunction("textTranslate", textData); const translatedAudio = await fetch(n8nWebhookURL, audioData);

Testing and Deployment
* Run unit tests to validate the functions and workflow.
* Deploy the changes and monitor for any issues.

HIGH LEVEL CODE DESIGN:

// 1. Appwrite Function for Text Translation (Deploy this using Appwrite Console)
const translateText = async (text: string, sourceLang: string, targetLang: string) => {
  // Use SeamlessM4T API or library here
  return translatedText;
};

// 2. Appwrite Function for Audio Translation (Deploy this using Appwrite Console)
const translateAudio = async (audioData: Buffer, sourceLang: string, targetLang: string) => {
  // Use SeamlessM4T API or library here
  return translatedAudio;
};

// 3. n8n Workflow (JSON representation; Import this into n8n)
const n8nWorkflow = {
  "nodes": [
    {
      "type": "webhook",
      "parameters": {
        "url": "start-translation",
        "responseCode": 200
      }
    },
    {
      "type": "function",
      "parameters": {
        "functionCode": `// Add SeamlessM4T logic here`
      }
    }
  ]
};

// 4. Integration with 'hqplanUI-master' (TypeScript code inside your React component)
import Appwrite from 'appwrite';  // Import Appwrite SDK

// Initialize Appwrite
const appwrite = new Appwrite();
appwrite.setEndpoint('YOUR_APPWRITE_ENDPOINT');

const textData = {
  text: "Hello, world!",
  sourceLang: "en",
  targetLang: "es"
};

const audioData = {
  audio: audioBuffer,
  sourceLang: "en",
  targetLang: "es"
};

// Call Appwrite function for text translation
const translatedText = await appwrite.callFunction("textTranslate", textData);

// Trigger n8n workflow for audio translation
const translatedAudio = await fetch("YOUR_N8N_WEBHOOK_URL", {
  method: 'POST',
  body: JSON.stringify(audioData)
});
h0lybyte commented 10 months ago

The Appwrite Function can also serve as just a trigger

This way the n8n can update the status of the workflow dynamically.

h0lybyte commented 10 months ago

Created the repo here -> https://github.com/KBVE/widget-seamless-m4t/

For now this is a temp repo, I am going to mess around with it a bit in the next couple hours.

8gratitude8 commented 10 months ago

Notes from Discord: Widget Idea - A simple button added in the audio/video channel on channel creation that would launch Seamless M4T as a serverless function; can be as simple as an iframe embed of the streamlit app that pops-up when the button/widget is clicked. We start with that and can add complexity/improve UI after, if we can get streamlit UI as a popup iframe inside of project application; working we can focus on getting the audio data/video translated after since that will probably require more backend work. At least for the 24 hour hackathon for the presentation we want a pop-up UI that we can record for a solid demo

h0lybyte commented 10 months ago

I will close this issue ticket out! Hopefully we get a solid place for this competition!

Sucks we were not able to completely finish in time, but hey at least it was worth a shot! I learned a bit more about parcel 2 and the recent improvements that it has made. The next updates would be pushing finished widgets into the KBVE.com repo via a patch pull style, a bit similar to how Sweep does it.