collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
1.56k stars 211 forks source link

Simple Client Recording Attempt #39

Open justinlevi opened 11 months ago

justinlevi commented 11 months ago

I start up the server via $ python ./run_server.py

(whisper_live)  whisperlive git:(main)✗  🚀 python ./run_server.py
Downloading: "https://github.com/snakers4/silero-vad/archive/master.zip" to /Users/justinwinter/.cache/torch/hub/master.zip
2023-08-21 12:14:34.119619 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119647 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119652 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119655 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119659 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119696 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119701 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119704 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119708 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model.
2023-08-21 12:14:34.119711 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model.
ERROR:root:no close frame received or sent

Then start up the client via:

(whisper_live)  whisperlive git:(main)✗  🚀 python ./run_client.py
[INFO]: * recording
[INFO]: Waiting for server ready ...
False en transcribe
[INFO]: Opened connection
[INFO]: Server Ready!
Traceback (most recent call last):
  File "/Users/justinwinter/projects/whisperlive/./run_client.py", line 3, in <module>
    client()
  File "/Users/justinwinter/projects/whisperlive/whisper_live/client.py", line 298, in __call__
    self.client.record()
  File "/Users/justinwinter/projects/whisperlive/whisper_live/client.py", line 234, in record
    data = self.stream.read(self.CHUNK)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/whisper_live/lib/python3.9/site-packages/pyaudio/__init__.py", line 570, in read
    return pa.read_stream(self._stream, num_frames,
OSError: [Errno -9981] Input overflowed

// run_client.py

from whisper_live.client import TranscriptionClient
client = TranscriptionClient("0.0.0.0", "8080", is_multilingual=False, lang="en", translate=False)
client()
zoq commented 10 months ago

Are you running this on a mac?

zoq commented 10 months ago

A temporary workaround is to set exception_on_overflow=False in https://github.com/collabora/WhisperLive/blob/main/whisper_live/client.py#L234.

This might cause that we skip some frames, we are looking into updating the frame rate for the different platforms.

justinlevi commented 10 months ago

@zoq Thanks for the idea. Yes, I am on a mac. I tried setting the exception_on_overflow=False but still getting the same error:

(whisper_live)  whisperlive git:(main)✗  🚀 python ./run_client.py
[INFO]: * recording
[INFO]: Waiting for server ready ...
False en transcribe
[INFO]: Opened connection
[INFO]: Server Ready!
Traceback (most recent call last):
  File "/Users/justinwinter/projects/whisperlive/./run_client.py", line 3, in <module>
    client()
  File "/Users/justinwinter/projects/whisperlive/whisper_live/client.py", line 299, in __call__
    self.client.record()
  File "/Users/justinwinter/projects/whisperlive/whisper_live/client.py", line 235, in record
    data = self.stream.read(self.CHUNK)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/whisper_live/lib/python3.9/site-packages/pyaudio/__init__.py", line 570, in read
    return pa.read_stream(self._stream, num_frames,
OSError: [Errno -9981] Input overflowed
image

    def record(self, out_file="output_recording.wav"):
        n_audio_file = 0
        # create dir for saving audio chunks
        if not os.path.exists("chunks"):
            os.makedirs("chunks", exist_ok=True)
        try:
            for _ in range(0, int(self.RATE / self.CHUNK * self.RECORD_SECONDS)):
                if not Client.RECORDING: break
                self.exception_on_overflow=False
                data = self.stream.read(self.CHUNK)
                self.frames += data

                audio_array = Client.bytes_to_float_array(data)

                self.send_packet_to_server(audio_array.tobytes())

                # save frames if more than a minute
                if len(self.frames) > 60*self.RATE:
                    t = threading.Thread(
                        target=self.write_audio_frames_to_file,
                        args=(self.frames[:], f"chunks/{n_audio_file}.wav", )
                    )
                    t.start()
                    n_audio_file += 1
                    self.frames = b""
zoq commented 10 months ago

Okay, I can reproduce the issue on a mac, I'll come up with a fix.

aavetis commented 8 months ago

welp

Geczy commented 8 months ago

rip?

zoq commented 8 months ago

welp

Do you still have this problem with the latest release?

sbrnaderi commented 8 months ago

I have pulled the latest changes from the git repo and I still have this issue on my mac. Have you made changes regarding this issue? Thanks.

sbrnaderi commented 8 months ago

OK, I just tried to increase the self.chuck value to 1024 * 4 and I don't get the error anymore and the transcription works fine. This is in client.py file.

Screenshot 2023-11-18 at 12 56 03
zoq commented 8 months ago

This is on a mac?

sbrnaderi commented 8 months ago

This is on a mac?

Yes, this is on MacBook Pro (intel based). I increased the chunk size and I could get the code to work. I also noticed that if I use the bigger Whisper model (medium), then I have to increase this further to 1024 * 8.

arbianqx commented 7 months ago

This issue still persists on macbook. Tried sending an audio file, is working just fine.

zoq commented 7 months ago

Looking into it today.

niderhoff commented 7 months ago

OK, I just tried to increase the self.chuck value to 1024 * 4 and I don't get the error anymore and the transcription works fine. This is in client.py file. Screenshot 2023-11-18 at 12 56 03

tried that but for me it still crashed with the original error.

zoq commented 7 months ago

I can confirm if I update the chunk size it works. @niderhoff let try to figure out why it's not working on your system. Just to make sure our setup is the same. You are using the pip package, and not the docker container, or do you run the scripts without the pip package?

asadal commented 6 months ago

I have the same problem.

[INFO]: * recording
[INFO]: Waiting for server ready ...
True ko transcribe
[INFO]: Opened connection
[INFO]: Server Ready!
Traceback (most recent call last):
  File "/Users/asadal/Documents/Dev/Hani/WhisperLive_streamlit.py", line 13, in <module>
    client()
  File "/Users/asadal/miniconda3/lib/python3.10/site-packages/whisper_live/client.py", line 490, in __call__
    self.client.record()
  File "/Users/asadal/miniconda3/lib/python3.10/site-packages/whisper_live/client.py", line 371, in record
    data = self.stream.read(self.chunk)
  File "/Users/asadal/miniconda3/lib/python3.10/site-packages/pyaudio/__init__.py", line 570, in read
    return pa.read_stream(self._stream, num_frames,
OSError: [Errno -9981] Input overflowed

Macbook Pro 14 M1 Pro. Simple client Recording. I used pip package.

# Run the client
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
  "localhost",
  9090,
  is_multilingual=True,
  lang="ko",
  translate=False,
  model_size="small"
)
client()

Then, I encountered an error,

TypeError: TranscriptionClient.__init__() got an unexpected keyword argument 'model_size'

So, I disabled the option, model_size="small" and ran again. But an error occured, OSError: [Errno -9981] Input overflowed

I changed self.chunk = 1024 to self.chunk = 1024 * 4. But encounterd same error.

zoq commented 6 months ago

We have to release the latest pip package, in the meantime you can remove model_size="small" from the TranscriptionClient call. For the overflow issue, can you try stream.read(self.chunk, exception_on_overflow=False) in https://github.com/collabora/WhisperLive/blob/main/whisper_live/client.py#L415

asadal commented 6 months ago

Thanks zoq, I'll wait for update pip. Thank you for creating such a great application.

Best Regards.

makaveli10 commented 6 months ago

@asadal pip package is updated. Let us know if you are still facing the issue.

kjyv commented 6 months ago

I have had success with the newest version and setting exception_on_overflow to False. Can this be set by default?

zoq commented 6 months ago

https://github.com/collabora/WhisperLive/pull/83 does that, we will merge it and release a new version

JonathanLehner commented 6 months ago

I got segmentation fault with the latest version...

makaveli10 commented 6 months ago

@JonathanLehner can you share more details when do you see the segfault?and does it happen always on the latest version?

JonathanLehner commented 5 months ago

I just tried the demo from the Readme:

llm_server.py from whisper_live.server import TranscriptionServer server = TranscriptionServer() server.run("0.0.0.0", 8080)

llm_client.py from whisper_live.client import TranscriptionClient client = TranscriptionClient( "localhost", 8080, is_multilingual=True, lang="en", translate=False,

model_size="tiny"

)

client()

client("audio_test.wav")

python llm_server.py zsh: segmentation fault python llm_server.py (physiotherapy) jonathan@Jonathans-MBP physiotherapy % /usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '