aarnphm / whispercpp

Pybind11 bindings for Whisper.cpp
Apache License 2.0
322 stars 57 forks source link

How do I print out the timestamps along with the transcribed text? #10

Closed AdithyanI closed 1 year ago

AdithyanI commented 1 year ago

Feature request

Hi,

Thanks for writing this out. I am using it in my project. And I am a beginner in Python, and I am currently using it.

I tried the sample code with a file, and it works. But I do not get the timestamps in the output. But only the transcribed file. I tried to do this :

    w = Whisper.from_pretrained(MODEL_NAME)
    w.params.token_timestamps = True
    try:
        y, _ = (
            ffmpeg.input(input_file, threads=0)
            .output("-", format=SAMPLE_FORMAT, acodec=SAMPLE_CODEC, ac=1)
            .run(
                cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
            )
        )
    except ffmpeg.Error as e:
        raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

    arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
    return w.transcribe(arr, num_proc=3)

And still it does not work. I would like to have the timestamps in the output like the original whisper:

[00:00:00.000 --> 00:00:07.000]   If you want to know how to say "hi" in Spanish the next time you greet someone, just say "ola".
[00:00:07.000 --> 00:00:18.000]   For a more casual way to say "hi", try "kipasa", which means "what's happening", or "kital", which means "w

How can I achieve this? Please let me know. Sorry if this is a noob question.

Motivation

No response

Other

No response

aarnphm commented 1 year ago

All of the lower API from C++ are exposed via w.context.

Right now w.transcribe is more of a easy-to-use API. If you want to get timestamp you probably will need to implement your own inference call. Take a look at how main.cpp does it. It is pretty easy to reproduce.

aarnphm commented 1 year ago

the main purpose of this library is just the binding, so that it is easier to interact with all of the C API. Maybe I can write an example to introduce some of the capabilities.

AdithyanI commented 1 year ago

the main purpose of this library is just the binding, so that it is easier to interact with all of the C API. Maybe I can write an example to introduce some of the capabilities.

Am example would be very helpful for a beginner like me. And would be much appreciated.

aarnphm commented 1 year ago

https://github.com/aarnphm/whispercpp/tree/main/examples

AdithyanI commented 1 year ago

Thank you @aarnphm 👏 !

johnidm commented 1 year ago

I wrote a solution to get the timestamp from a chunk of words (aka segment)

https://gist.github.com/johnidm/90fa597c06a9a4192018893dd57ef0fb