Race condition in text_to_speech func

I believe there might be a race condition in the following section of the code:

# clip audio if the current chunk exceeds 30 seconds, this basically implies that
# no valid segment for the last 30 seconds from whisper
if self.frames_np[int((self.timestamp_offset - self.frames_offset)*self.RATE):].shape[0] > 25 * self.RATE:
    duration = self.frames_np.shape[0] / self.RATE
    self.timestamp_offset = self.frames_offset + duration - 5

samples_take = max(0, (self.timestamp_offset - self.frames_offset)*self.RATE)
input_bytes = self.frames_np[int(samples_take):].copy()
duration = input_bytes.shape[0] / self.RATE
if duration < 0.4:
    # If the audio duration is short, release the lock and wait
    self.lock.release()
    time.sleep(0.01)    # 5ms sleep to wait for some voice active audio to arrive
    continue

You are accessing a shared resource between threads without proper synchronization in this section. Could you please review this code?

collabora / WhisperFusion

Race condition in text_to_speech func #47