FlutterFlow / flutterflow-issues

A community issue tracker for FlutterFlow.
118 stars 18 forks source link

Hardcoded audio format is wrong and leads to incorrect playback - big issue with OpenAI Whisper for example #2751

Open phils-hub opened 5 months ago

phils-hub commented 5 months ago

Has your issue been reported?

Current Behavior

Adding a voice recording leads to an FFUpload file being generated, with the extension type hardcoded to mp3. In general hardcoding a file extension that is OS specific sounds like a bad idea. In the case of iOS (version 15 tested), recordings happen in the .m4a format. Therefore the file uploaded to an API endpoint or Supabase, etc. is marked as mp3 but actually m4a. Any receiving system that relies on reading the file extension or filename for determining the format will fail (e.g. OpenAi speech to text).

The root cause is here when inspecting any widget that triggers an action to stop recording:

await stopAudioRecording(
    audioRecorder: _model.audioRecorder,
    audioName: 'recordedFileBytes.mp3',
    onRecordingComplete: (audioFilePath, audioBytes) {
      _model.recordedUserMessage = audioFilePath;
      _model.recordedFileBytes = audioBytes;
    },
  );

The file extension is incorrectly hardcoded to recordedFileBytes.mp3'.

No idea how Android behaves, probably similar issues there.

Expected Behavior

Platform specific file extension. For iOS the outcome should be recordedFileBytes.m4a.

Steps to Reproduce

  1. Create a new page
  2. Add an audio recording start action to a button
  3. Add an audio recording stop action to a button
  4. Upload the file
  5. Deploy to an iOS device
  6. Inspect the file type with ffsprobe or similar.

Reproducible from Blank

Bug Report Code (Required)

ITEez8rluJNJpbxJ7c+AY8JWqwMxQUB+bZ0vlO1FGE0aCLLxOooQOvfEaXdeX8+hfH1+e1Wbmj00ptLsv/GWN8A4OSusbaF607VQWxXKfESQb662EKWeW0dREeBXGGLC1J+Jux4kKtBidF412WGLcq3qNleeY8aSfxBlZ7vfcPo=

Context

Transfer audio data to an API for STT

Visual documentation

There is enough information there. Just check the source code. Hardcoded file extension!

Additional Info

No response

Environment

- FlutterFlow version: FlutterFlow v4.1.40 released April 17, 2024, Flutter version is 3.19.1
- Platform: Web and Mac
DigitalLabSlash commented 5 months ago

Is this why you can't replay an audio recorded with the recording widget ? (using only flutterflow widgets). Even the demo here : https://docs.flutterflow.io/actions/actions/utilities/audio-recording is not working on ios...

DigitalLabSlash commented 5 months ago

Here is some more information here as we are using this feature in our current project ;

It seems the audioplayer widget can't play m4a files, therefore, it would be better to really save the audios in mp3 (or wav;) rather than saving them in m4a as they won't be able to be replayed through the audio widget...

rzambroni commented 5 months ago

Hey @phils-hub, i tested the demo project made by the FF team on IOS and seems to be working as expected: example project (recording and playback of the recording are working)

However, that example doesn't upload the file to any external source (e.g. Supabase, API). We suspect this might be related to another issue that was already reported a few days ago.

See here: https://github.com/FlutterFlow/flutterflow-issues/issues/2749

The other issue was already confirmed and on the team's backlog to be fixed.

DigitalLabSlash commented 5 months ago

Hi, we have been testing all week, there is an issue in how files are recorded even on iOS. You can't send the file to any api. Something with the audiopath is wrong so people come up with this kind of solutions : https://community.flutterflow.io/ask-the-community/post/api-call-with-openai-for-transcriptions-OuYaJM5hNit3JOn

A bit of background:

The API Call needs the file (= bytes) itself, not the file name. The FlutterFlow API interface requires an "FFUploadedFile" in order to pass a file into an API call. The "Record Audio" action, however, returns an audio path.

So the challenge: How to convert an audio path to an FFUploadedFile with the right content.

FFUploadedFile basically has a "bytes" property and a "name" property, which are all that matter for audio. When you look at the API implementation code that FlutterFlow generates, this becomes obvious. So now the only task left is to get the audio data.

Converting the audio path to a string and displaying this string in a native reveals that the path leads to a file, not a blob URL (as was shown in a video I found online). So then the task is: Create a file with the content in this location, read the content, and create an FFUploadedFile variable with that content.

Store the result in some app/page state variable and pass it into the API call, and it should work fine.

phils-hub commented 5 months ago

@DigitalLabSlash FlutterFlow actually automatically creates this file for you after the recording has finished. You do not need the audiopath. If you browse through the action options you will see a file that is generated as an output of a completed recording. That file is what you need. The actual problem is that this file has a .mp3 extension, even though it is m4a (on iOS at least), but that is a separate issue (which is why i created this ticket).

phils-hub commented 5 months ago

@rzambroni thanks for the update. The issue is indeed when moving it out. If you publish the file to Supabase then it is playable, but it is identified as mp3 because FlutterFlow hardcodes the extension as such. The file is however m4a. OpenAI completely rejects it because the filename format does not much the actual format.

The root cause here is the FlutterFlow is not assigning the correct extension of the file being created. If the format is platform specific, then the extension should be determined based on the platform. If the format is controlled by Flutter/FlutterFlow, then it should be set to whatever they use. Simply marking it as mp3 is wrong.

DigitalLabSlash commented 5 months ago

Thanks @phils-hub the issue is that sometimes the file is also corrupted...

So this is what we are doing for anyone interested: we have created a Flask API that converts to clean mp3 and sends the audio to whisper.

In order to avoid any issues:

  1. First we send the recording to supabase
  2. Then we query the flask api with the file url on supabase to get both the converted, non corrupted mp3 and the whisper transcription
  3. We get the transcription and a clean mp3 that we can play in any audio player
phils-hub commented 5 months ago

@DigitalLabSlash I see, and I understand why you went with this workaround. It is however inherently quite a lot of work. I almost did the same but then stopped due to setup issues. What library do you use for corruption checking and correction? It may also be worth publicising the solution via a repo until this problem is fixed.

DigitalLabSlash commented 5 months ago

Unfortunately, despite all this efforts, for some reason the api call to my api is blocked (see https://github.com/FlutterFlow/flutterflow-issues/issues/2795)

Here is the code i use for my heroku app :

import logging import os import uuid import requests # To download the file from URL from flask import Flask, request, send_from_directory, jsonify from flask_cors import CORS import subprocess from werkzeug.utils import secure_filename import openai

app = Flask(name) CORS(app) # Enable CORS for all domains on all routes

Base directory for handling file downloads and conversions

base_dir = os.path.join(os.getcwd(), 'audio_converts') os.makedirs(base_dir, exist_ok=True)

Initialize OpenAI client (replace with your API key)

client = openai.OpenAI(api_key="API KEY") # Using OpenAI class

@app.route('/convert', methods=['POST']) def convert(): data = request.get_json() if 'fileUrl' not in data: return jsonify(error="No 'fileUrl' in request"), 400

file_url = data['fileUrl']
response = requests.get(file_url)
if response.status_code != 200:
    return jsonify(error="Failed to download the file"), 400

# Save the downloaded file
file_base = secure_filename(file_url.split('/')[-1])
file_ext = os.path.splitext(file_base)[-1]
unique_id = str(uuid.uuid4())  # Generate unique ID
filename = f"{file_base}_{unique_id}{file_ext}"
filepath = os.path.join(base_dir, filename)

with open(filepath, 'wb') as f:
    f.write(response.content)

# Convert the file using FFmpeg securely
output_filename = f"{file_base}_{unique_id}_converted.mp3"  # Convert to mp3 format
output_path = os.path.join(base_dir, output_filename)
try:
    subprocess.run(['ffmpeg', '-i', filepath, '-codec:a', 'libmp3lame', '-q:a', '2', output_path], check=True)
except subprocess.CalledProcessError as e:
    logging.error("Conversion failed:", exc_info=True)
    os.remove(filepath)  # Remove the original file if conversion fails
    return jsonify(error="Conversion failed", message=str(e)), 500

# Transcribe the converted audio file using Whisper
try:
    with open(output_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
    transcript_text = transcription.text
except openai.OpenAIError as e:
    logging.error("OpenAI API Error:", exc_info=True)
    os.remove(filepath)  # Clean up the original file
    os.remove(output_path)  # Clean up the converted file
    return jsonify(error="Transcription failed",
                   message=str(e),
                   openai_error=e.status_code,
                   openai_error_type=str(type(e).__name__)), 500

os.remove(filepath)  # Clean up the original file after successful conversion and transcription

# Provide a URL for downloading the converted file
download_url = f"https://thawing-coast-39522-436006cd4744.herokuapp.com/download/{output_filename}"
return jsonify(download_url=download_url, transcript=transcript_text)

@app.route('/download/') def download(filename): filename = secure_filename(filename) return send_from_directory(base_dir, filename, as_attachment=True)

if name == 'main': app.run(debug=True)

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 7 days with no activity. If there are no further updates, a team member will close the issue.

phils-hub commented 5 months ago

This is not stake. Waiting for a response from the team

paulperez-dev commented 4 months ago

Hi guys, sorry for the late response, the team is looking into this, we will keep you posted here.

robwilliamsav commented 4 months ago

Hey folks, any ETA or update? our app relies heavily on this and we're holding off launching for a fix.

robwilliamsav commented 3 months ago

@paulperez-dev It would be awesome to understand the process now? should we recode our app to avoid audio? or could this be resolved soon? :)

paulperez-dev commented 3 months ago

Hi guys! The team is working on this, there is no ETA yet.

paulVu commented 3 months ago

please

maryfairy commented 3 months ago

Is there a workaround with custom code in the meantime? Everything public from comments / marketplace / tutorials don't seem to fit the bill of build once for all Web, iOS, Android

phils-hub commented 3 months ago

@paulperez-dev i really do not understand why the teams are focusing on recursive components and all this fancy stuff, incl. publishing new tutorials on chatgpt like streaming, when such a core building block is broken...

amirbahojb76 commented 3 months ago

Any update?

DigitalLabSlash commented 3 months ago

Hello guys, as I see many people struggling with this, we have coded a workaround please contact us at contact@slash-digital.io if you want to implement our fix

robwilliamsav commented 2 months ago

@paulperez-dev any update on this issue? coming up on 3 months :(

alih552 commented 2 months ago

Hi any updates on this?

hannst commented 2 months ago

Any updates on this? It's preventing any microphone recording being sent to an API like Whisper and seems like a pretty straight forward fix. A fix would be really appreciated.

oper2k commented 1 month ago

Any updates on this?

ramyzeidan commented 1 month ago

do we have any updates on this?

DigitalLabSlash commented 1 month ago

Hello Guys, many of you are writting to us regarding this issue. Our agency can fix it for you in 2 days time, do not hesitate to contact us

amirmohammadshamss commented 3 weeks ago

use cloud function for send request and convert voice

`const functions = require("firebase-functions"); const admin = require("firebase-admin"); const path = require("path"); const fs = require("fs"); const axios = require("axios"); const FormData = require("form-data"); const ffmpeg = require("fluent-ffmpeg");

// To avoid deployment errors, do not call admin.initializeApp() in your code

exports.transcribeAudio = functions.https.onCall(async (data, context) => { const openaiApiKey = ""; // Update this with your actual OpenAI API key const fileName = data.fileLocation;

try { // Download the file from Firebase Storage const bucket = admin.storage().bucket(); const tempFilePath = path.join("/tmp", path.basename(fileName)); await bucket.file(fileName).download({ destination: tempFilePath }); const tempWavPath = path.join("/tmp", ${path.basename(fileName, ".mp3")}.wav); // Read the file content

// Convert mp3 to wav using ffmpeg
await new Promise((resolve, reject) => {
  ffmpeg(tempFilePath).toFormat("wav").on("end", resolve).on("error", reject).save(tempWavPath);
});

// Create form data for OpenAI Whisper API

const formData = new FormData();

// Send audio to Whisper API
formData.append("file", fs.createReadStream(tempWavPath));
formData.append("model", "whisper-1");

// Prepare headers for the API request
const headers = {
  Authorization: `Bearer ${openaiApiKey}`,
  ...formData.getHeaders(), // Merge formData headers
};

// Send the file to OpenAI's transcription endpoint
const url = "https://api.openai.com/v1/audio/transcriptions";
return makeApiRequest({
  method: "post",
  url,
  headers,
  body: formData,
  returnBody: true,
  isStreamingApi: false,
});

} catch (error) { console.error("Error transcribing audio:", error); return { statusCode: 400, error: ${error.message}, }; } });

async function makeApiRequest({ method, url, headers, params, body, returnBody, isStreamingApi }) { return axios .request({ method, url, headers, params, responseType: isStreamingApi ? "stream" : "json", ...(body && { data: body }), }) .then((response) => { return response.data; }) .catch((error) => { return { statusCode: error.response?.status || 500, headers: error.response?.headers || {}, ...(returnBody && { body: error.response?.data }), error: error.message, }; }); }`

MaggieThomann commented 2 weeks ago

I have a fix using a single custom action without any API calls or dependencies that at least works for iOS. The code is below, but it essentially involves creating a new FFUploadedFile object with the correct file extension and the bytes equal to the one created by the native Stop Audio Recording action. The use of this in my action tree is in the video -- the key is in calling this when we're not on web and then saving the result of that to a local state variable of type Uploaded File and then referring to that local state variable (called recordedAudioFile in my case) anywhere else I need it (eg: in the Whisper call).

https://github.com/user-attachments/assets/bbf802ba-50d5-4e29-a260-c7ca30ee332f


import '/backend/schema/structs/index.dart';
import '/backend/schema/enums/enums.dart';
import '/flutter_flow/flutter_flow_theme.dart';
import '/flutter_flow/flutter_flow_util.dart';
import '/custom_code/actions/index.dart'; // Imports other custom actions
import '/flutter_flow/custom_functions.dart'; // Imports custom functions
import 'package:flutter/material.dart';
// Begin custom action code
// DO NOT REMOVE OR MODIFY THE CODE ABOVE!

Future<FFUploadedFile> renameAudio(
    FFUploadedFile audio, String audioPath) async {
  // Add your function code here!

  // This is sort of a seemingly unncessary thing to need to do, but there is currently a bug in FlutterFlow,
  // which was reported here: https://github.com/FlutterFlow/flutterflow-issues/issues/2751 and
  // the team is aware and actively searching for a way to fix! The gist is that we assume file
  // type to be mp3 when we stop the audio recording. This works for some devices that have recorded
  // and saved the file as an mp3. But on any other devices, like an iPhone for example, the
  // recorded file is not guaranteed to be in mp3 format.
  //
  // So this function takes in the recorded file path to determine the correct format for the name
  // of the FFUploadedFile by using regex to get the file extension and then composing a name based
  // off of that for the FFUploadedFile.
  //
  // In the future, once the issue is fixed, this action won't be necessary.

  RegExp regExp = RegExp(r'\.([a-zA-Z0-9]+)$');
  Match? match = regExp.firstMatch(audioPath);

  return FFUploadedFile(
    name: "recordedFileBytes.${match!.group(1)}",
    bytes: audio.bytes,
  );
}