ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.38k stars 3.61k forks source link

WIP: ruby: New segment callback #2495

Closed KitaitiMakoto closed 1 week ago

KitaitiMakoto commented 2 weeks ago

I added new segment callback to Ruby binding.

By this patch, we can set new segment callback which is called on every segment.

whisper = Whisper::Context.new("path/to/model.bin")
params = Whisper::Params.new
params.new_segment_callback = ->(text, start_time, end_time, index) {
  puts "[#{start_time}ms --> #{end_time}ms] #{index}th segment: #{text}"
}
whisper.transcribe "path/to/audio", params

It can abort by some condition without waiting for transcription complete:

search_word = "A_WORD_I_WANT_TO_FIND"
params.new_segment_callback = ->(text, start_time, end_time, index) {
  if text.match? search_word
    puts "#{search_word} found at between #{start_time}ms and #{end_time}ms"
    raise # an exception stops transcription
  end
}

But I'm worried about something below:

Callback arguments

I put segment index at last in arguments for callback function. Is this good? If we add other arguments, for instance, token, speaker turn and so on, in the future, the position of segment index changes. It will cause compatibility issue for existing code using whispercpp gem. Is that acceptable as this is beta version? Should we use a Hash or define a class for callback arguments?

Params initialization

Now I initialize Whisper::Params#new_segment_callback by nil:

  rwp->new_segment_callback = Qnil;

https://github.com/KitaitiMakoto/whisper.cpp/blob/39a988daca2c69a8504d3081c6442ff97fcacd87/bindings/ruby/ext/ruby_whisper.cpp#L76

Should it be kept initialized by NULL to reduce unnecessary code? I'm not sure because I'm not familiar with C/C++.

Occupation

Is it good for single Ruby proc object to occupy whisper_context_params' new_segment_callback_user_data member:

    rwp->params.new_segment_callback_user_data = &rwp->new_segment_callback;

https://github.com/KitaitiMakoto/whisper.cpp/blob/39a988daca2c69a8504d3081c6442ff97fcacd87/bindings/ruby/ext/ruby_whisper.cpp#L223

Again, because I'm not familiar with C/C++, I wonder there are any other usages for new_segment_callback_user_data?.

Ruby API

Current Ruby API Whisper::Params#new_segment_callback= follows C++ API. But other API might be suitable for Ruby such as:

params.on_segment do |text, start_time, end_time, index|
  # ...
end

or

whisper.on_segment do |text, start_time, end_time, index|
  # ...
end
# not params

How do you think?

Abortable

This is just information rather than discussion. As a side effect, this patch allows us to abort transcription by Ctrl-C (SIGINT) if we set some callback though we cannot to it once whisper started transcription on current master. This is because Ruby gains control when C++ code call Ruby callback object, and it handle with SIGINT.

I think we should define how to abort itself, though.

Could you review it?

KitaitiMakoto commented 2 weeks ago

This implementation might prevent unlocking GVL as said at https://github.com/ggerganov/whisper.cpp/discussions/507#discussioncomment-5123511

ggerganov commented 1 week ago

I hope more people join and help with the review as I'm not really familiar with Ruby.

I put segment index at last in arguments for callback function. Is this good?

I think the Ruby bindings should expose the original callback signature:

https://github.com/ggerganov/whisper.cpp/blob/a5abfe6a90495f7bf19fe70d016ecc255e97359c/include/whisper.h#L440-L444

The way it is currently proposed in the PR, it looks to be very limiting and prevents taking full advantage of the information available by the C-style API.

KitaitiMakoto commented 1 week ago

Okay, you're right. So, we need to add many APIs to current Whisper::Context and Whisper::Params at first. I will take it easy.

KitaitiMakoto commented 1 week ago

Will close for now to prevent CI from running for WIP commits.