Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser
https://joinpeertube.org/
GNU Affero General Public License v3.0
12.78k stars 1.46k forks source link

Support auto-captioning of Lives during broadcast for hugely improved accessibility. #4505

Open shibacomputer opened 2 years ago

shibacomputer commented 2 years ago

Describe the problem to be solved Peertube's Live system is a really powerful and useful feature. Peertube's subtitle support for stored videos (and the ability to add them after the fact) is also hugely important for accessibility. However, to truly make Peertube an accessible platform, the Live player should also support auto-captioned subtitles that can be turned on or off by the viewer.

Describe the solution you would like: There is good live captioning support in tools such as OBS. Alongside human-supplied captioning, there's a very solid plugin that generates very fast auto / AI captioning and streams the captions alongside video content. When captions are detected in a livestream, Peertube should offer to display them as per a non-live video. This would make Peertube significantly more accessible to both individuals who require either audio or visual assistance to fully participate as a viewer of Peertube's lives.

EchedelleLR commented 1 year ago

Possibility to use Whisper for this task?

iacore commented 1 year ago

Possibility to use Whisper for this task?

I heard that transcoding 2 minutes of video need 1 minute on a Desktop with GPU. This depends very much on the performance of the machine.

Therefore, it is more practical to do this on upload side.

nfbyte commented 11 months ago

Therefore, it is more practical to do this on upload side.

The tool linked by the issue author does not add captioning client-side, it uses the Google Cloud Speech Recognition API which itself uses something like Whisper server-side. Performing ML tasks (like speech recognition) client-side is not practical at all at the moment.

shibacomputer commented 11 months ago

Possibility to use Whisper for this task?

This is not an issue asking for ML transcription. It is an issue asking for Peertube to support subtitles that are streamed alongside video content. Currently, the only way to do this is to 'bake' the subtitle into the livestream by compositing it as a source/layer from your livestream source. This issue describes a method that treats streamed subtitles in the same way as non-Live Peertube videos that have subtitle files. Whether it uses Whisper or any other ML Speech to Text service is not relevant for the issue, because the ideal solution in this case is source agnostic.

Edit to add: Whether this is done as part of the Peertube live, or supplied by the livestreamer (as, say, the OBS plugin linked in my original issue) isn't as relevant right now as the fact that Peertube can't display subtitles on a livestream.

nfbyte commented 11 months ago

It is an issue asking for Peertube to support subtitles that are streamed alongside video content.

I think this would be a hack, as at the moment any client-side solution will likely rely on external (potentially proprietary / non-free) APIs anyway. The ideal source agnostic way to add captioning (for both videos and livestreams) is server-side in PeerTube, as I mention in #5931.

shibacomputer commented 11 months ago

What I wanted to highlight in my previous comment was that I do not think Peertube should be doing the ML translation itself. I think we are broadly saying a similar sort of thing here, except using different examples.

ROBERT-MCDOWELL commented 11 months ago

Creating a real-time client-side speech-to-text with translation for a live stream involves several steps and technologies. Below is a high-level overview of how you could achieve this using HTML, JavaScript, and relevant APIs:

  1. Set Up the Webpage: Create an HTML page that includes the necessary elements for capturing audio and displaying the transcribed and translated text.

  2. Capture Audio: Use the Web Speech API to capture audio from the user's microphone. The SpeechRecognition object can be used to start and stop capturing audio.

const recognition = new SpeechRecognition();
recognition.start();
recognition.onresult = (event) => {
  const transcript = event.results[0][0].transcript;
  // Handle the transcript (speech-to-text).
};
  1. Speech-to-Text: Extract the transcribed text from the captured audio using the Web Speech API. You can then display this text on your webpage.

  2. Translation: For translation, you can use a translation API like Google Cloud Translation, Microsoft Translator, or DeepL. You'll need to sign up for an API key and integrate it into your JavaScript code.

const translationApiKey = 'YOUR_TRANSLATION_API_KEY';
const sourceLanguage = 'en';  // Source language code (English)
const targetLanguage = 'fr';  // Target language code (French)

// Make a translation request to the API
async function translateText(text) {
  const response = await fetch(`https://translation.googleapis.com/language/translate/v2?key=${translationApiKey}&source=${sourceLanguage}&target=${targetLanguage}&q=${encodeURIComponent(text)}`);
  const data = await response.json();
  const translatedText = data.data.translations[0].translatedText;
  return translatedText;
}
  1. Real-time Update: Whenever new audio is transcribed, call the translation function and update the translated text on the webpage in real-time.
recognition.onresult = async (event) => {
  const transcript = event.results[0][0].transcript;
  const translatedText = await translateText(transcript);
  // Update the UI with the transcribed and translated text.
};
  1. Web Socket (Optional): For a smoother real-time experience, consider using WebSockets to stream the transcribed and translated text to the viewers of the live stream.

Cconsider performance implications for real-time audio processing.

chagai95 commented 11 months ago

Maybe there is a way to somehow reuse the code in this plugin for this? https://gitlab.com/apps_education/peertube/plugin-transcription