Initialize input_device_index when opening pyaudio stream

mkuehne-git commented 11 months ago

There are two more issues, which I want to add to this:

Once you completely record an animation and the re-execute the script without changing anything, then you have to record some voice-overs again. ** Reason: The number of channels is taken from from PaDeviceInfo maxInputChannels. This number does not necessarily match the actual number. In my case the device support two channels, but I have only a single (mono) microfon. I think it is better to default Recorder.channels to 1 instead of None and make sure channels is less than maxInputChannels.
I have a device with defaultSampleRate == 32000. And if I use the default rate of 44100, an exception is thrown. Therefore I want to limit rate to min(rate, defaultSampleRate).

osolmaz commented 11 months ago

Lgtm, I will merge and release a new version if you think the solution is mature enough. I cannot reproduce your issue on my end, so I will take your word for it.

Once you completely record an animation and the re-execute the script without changing anything, then you have to record some voice-overs again.

This is weird, can you show me how you cache.json looks like?

The caching behavior is currently to check whether there is a voiceover in the cache with the same input_data as the one in render time. input_data comes from the given and/or default values submitted in RecorderService + the text submitted in voiceover(). And you say that this PR fixes that issue?

mkuehne-git commented 11 months ago

I have locally reverted the changes from my 2nd commit (to reproduce the issue again) and ran the animation several times. So cache-1.json is the output after the first run, cache-2.json is the output after the second and so on. If you look at the channels attribute, you will notice, that only for the first entry: "This circle is drawn as I speak." the channels == 1 attribute. All the others have a value of two.

And this evaluates entry["input_data"] == input_data to False, which triggers the repeated recordings and also makes the json file grow.

    def get_cached_result(self, input_data, cache_dir):
        json_path = os.path.join(cache_dir / DEFAULT_VOICEOVER_CACHE_JSON_FILENAME)
        if os.path.exists(json_path):
            json_data = json.load(open(json_path, "r"))
            for entry in json_data:
                if entry["input_data"] == input_data:
                    return entry
        return None

I can only guess, but I assume, you are working with a laptop, with build-in microphone. And with that setup you probably never encounter an issue. I however have a desktop PC with USB headset plus USB Logitech camera with mic. I have listed my devices below - input only. For the attached recordings I used device #8.

0 HDA Intel PCH: ALC1220 Analog (hw:0,0) 2 0.008707482993197279 0.034829931972789115 44100.0
2 HDA Intel PCH: ALC1220 Alt Analog (hw:0,2) 2 0.008707482993197279 0.034829931972789115 44100.0
7 C922 Pro Stream Webcam: USB Audio (hw:2,0) 2 0.012 0.048 32000.0
8 Logitech H570e Stereo: USB Audio (hw:3,0) 2 0.008707482993197279 0.034829931972789115 44100.0
9 sysdefault 128 0.021333333333333333 0.021333333333333333 48000.0
19 default 128 0.021333333333333333 0.021333333333333333 48000.0

You can also see, that when using the 'C922 Pro Stream Webcam' with only 32000.0 samples/sec, using the default 44100 caused trouble.

With both commits, these troubles disappeared, and I can work with manin-voiceover just as it is supposed to work. Which is a great contribution by the way - thanks.

cache-.zip

osolmaz commented 11 months ago

I assume, you are working with a laptop

Yes

Thanks, I'll additionally try to improve the caching behavior when I have the time.

ManimCommunity / manim-voiceover

Initialize input_device_index when opening pyaudio stream #62