facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.8k stars 1.05k forks source link

tensor format input audio translation error #473

Open DengHao97 opened 3 months ago

DengHao97 commented 3 months ago

I want to build an API interface, but the translation after the audio input is wrong, I think I may have made a mistake when processing the audio file, may I ask what went wrong? If there is a completed API code, can you provide it to me? Here's my code:

`import torchaudio import torch from seamless_communication.inference import Translator from fastapi import FastAPI, File, UploadFile, Form

model_name = "seamlessM4T_v2_large" vocoder_name = "vocoder_v2" if model_name == "seamlessM4T_v2_large" else "vocoder_36langs"

translator = Translator( model_name, vocoder_name, device=torch.device("cuda:1"), dtype=torch.float16, )

app = FastAPI()

@app.post("/translate") async def translate( file: UploadFile = File(...), to_lang: str = Form(...), ): audio_input, sample_rate = torchaudio.load(file.file)

if not isinstance(audio_input, torch.Tensor):
    audio_input = torch.from_numpy(audio_input)
audio_input = audio_input.permute(1, 0)

text_output, _ = translator.predict(
    input=audio_input,
    task_str="s2tt",
    tgt_lang=to_lang,
)
print(f"Translated text: {text_output[0]}")
return {
    "code": 200,
    "text": str(text_output[0])
}

if name == "main": import uvicorn

uvicorn.run(app, host="0.0.0.0", port=7860)`