RainEggplant / chatgpt-telegram-bot

A ChatGPT bot for Telegram based on Node.js. Support both browserless and browser-base APIs.
MIT License
323 stars 97 forks source link

Voice Message Support [FEATURE IDEA] #17

Open PL-RM opened 1 year ago

PL-RM commented 1 year ago

Hi there amazing! I also been working on a python telegrambot with chatgpt but this is so much better. One thing I added was the functionality to transcribe the voice message and respond to that. I used https://www.assemblyai.com/ . Their API is very good you can choose many languages and understands very good. I'm no mastercoder but this is the python code to give you an idea!

`

# Check if the message is a voice message
if message.voice:
    # Get file ID and file object from message
    file_id = message.voice.file_id
    file = context.bot.get_file(file_id)

    # Set filename with user ID and sequence number
    filename = f"audio_{user_id}.ogg"
    i = 1
    while os.path.exists(os.path.join("tmp", filename)):
        filename = f"audio_{user_id}_{i}.ogg"
        i += 1

    # Download file to temporary directory
    file_path = os.path.join("tmp", filename)
    file.download(file_path)

    # Upload file to temporary hosting service using cURL
    curl_command = f"curl --upload-file {file_path} https://transfer.sh/{filename}"
    response = os.popen(curl_command).read().strip()
    audio_src_url = response

    # Set up the API request headers and data
    headers = {
    "authorization": f"Bearer {ASSEMBLYAI_CLIENT_TOKEN}",
    "content-type": "application/json",
    }

    data = {"audio_url": audio_src_url, "language_code": "es"}

    # Make the API request
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript", headers=headers, json=data
    )

    # Parse the response
    if response.status_code == 200:
        transcript_id = response.json()["id"]
    else:
        print("Error:", response.status_code, response.text)
        exit()

    # Wait for the transcript to be ready, but give up after 1 minute

    start_time = time.time()
    while True:
        response = requests.get(
            f"https://api.assemblyai.com/v2/transcript/{transcript_id}",
            headers=headers,
        )

        print("Status code:", response.status_code)

        if response.status_code == 200:
            status = response.json()["status"]
            if status == "completed":
                transcript = response.json()["text"]
                break
            elif status == "failed":
                print("Transcription failed:", response.json()["error"])
                exit()
        else:
            print("Error:", response.status_code, response.text)
            exit()

        elapsed_time = time.time() - start_time
        if elapsed_time > 60:
            print("Transcription took more than 1 minute")
            exit()

        time.sleep(5)  # Check status every 5 seconds

    #Set transcription as usertext promp for sending to ChatGPT
    usertext = transcript

    # Remove file
    os.remove(file_path)

`

RainEggplant commented 1 year ago

Thank you for your idea! I would say this is an interesting feature. But since this requires integrating third-party ASR services, it probably won't be added anytime soon. I'll have to work on other more important features (like per-user chat) first.

JokerQyou commented 1 year ago

Well, why not use OpenAI Whisper to transcribe voice messages? Does AssemblyAI has significant advantage over it?

RainEggplant commented 1 year ago

Well, why not use OpenAI Whisper to transcribe voice messages? Does AssemblyAI has significant advantage over it?

Yeah, I plan to. But it may take time as I'm busy with my paper recently. But PRs are welcome!