RustAudio / rodio

Rust audio playback library
Apache License 2.0
1.72k stars 223 forks source link

Randomly when playing a sound (wav) a very large memory allocation happens, on Linux Raspberry Pi #606

Open FinitelyFailed opened 2 weeks ago

FinitelyFailed commented 2 weeks ago

Sometimes when my program is going to play an audio file (wav), my program crashes with the error: memory allocation of 5404319552844632832 bytes failed

After that happens I have to reboot the computer to make audio work again. I have searched and found nothing on this issue.

Unfortunately I do not have a small code example, and I have it happened only when my program is running on a Raspberry Pi . I have not been able to reproduce it on my laptop.

I would love to get a hint on where I should start looking for a solution.

dvdsk commented 2 weeks ago

5404319552844632832 is about 2^62 the code is trying to allocate/use a huge amount of memory in one go. It sounds to me like there might be an underflow somewhere.

Regarding it only happening on your pi, there can be a ton of reasons. If the program contains unsafe code that can trigger wildly different behavior depending on the system.

I can not really help without a minimal example or the code. I do not mind browsing through it?

I need to know if it is a problem caused by:

There are tree things you can try:

FinitelyFailed commented 2 weeks ago

This is the code for playing an audio file on a specific audio device at a specific volume.

I had some problems where "sleep_until_end" would end up getting a poll error from alsa, I think. So my log was full of: "an error occurred on output stream: A backend-specific error has occurred: alsa::poll() returned POLLERR". Which is way I added some code which kills the playback when the duration of the audio file has run (+ 50 ms to be sure). Or if there aren't any duration in the meta data the playback will run until "sleep_until_end" is done or until a specified time has run out. All to prevent the error to hang the playback forever.

    async fn play_sound_on_device(device: String,
        volume: f32,
        file_path: String,
        default_sound_duration: u64) -> anyhow::Result<()> {

        // IMPORTANT STUFF!
        // Note that there are functions used here that uses std io functions and thread-blocking
        // functions, i.e. NOT using Tokio tasks.
        // Which means that other tasks running on the same thread will be blocked.
        // The solution to this is to put them all in a task where it is ok to block the thread,
        // ie. by using spawn_blocking.
        let mut set = tokio::task::JoinSet::new();
        set.spawn_blocking(move || {

            // TODO check file exists first
             let file = match std::fs::File::open(&file_path) {
                Ok(file) => {

                    let metadata_string = if let Ok(metadata) = file.metadata() {
                        format!("{:?}", metadata)
                    } else {
                        "Unkown".to_owned()
                    };

                    info!("Successfully read file to play, path: '{file_path}', metadata: {metadata_string}");
                    file
                },
                Err(e) => {
                    warn!("Failed to open file, file path: '{file_path}', error: {e}");
                    bail!("Failed to open file, error: {e}");
                }
            };

            let decoder = match rodio::Decoder::new(BufReader::new(file)) {
                Ok(decoder) => {
                    info!("Successfully created decoder");
                    decoder
                },
                Err(e) => {
                    error!("Failed to create decoder, file path: '{file_path}', error: {e}");
                    bail!("Failed to create decoder, file path: '{file_path}', error: {e}");
                }
            };

            debug!("Source has {} channels", &decoder.channels());

            // Keep _stream on the stack, as it is used during playback and should only be dropped when leaving the scope.
            let (_stream, device_handle) = match Self::get_output_stream(&device) {
                Ok((_stream, device_handle)) => {
                    info!("Successfully created stream and device handle");
                    (_stream, device_handle)
                },
                Err(e) => {
                    error!("Failed to get output stream, device: '{device}', error: {e}");
                    bail!("Failed to get output stream, device: '{device}', error: {e}");
                }
            };

            let sink = match rodio::Sink::try_new(&device_handle) {
                Ok(sink) => {
                    info!("Successfully created sink");
                    sink
                },
                Err(e) => {
                    error!("Failed to create new sink, device: '{device}', error: {e}");
                    bail!("Failed to create new sink, device: '{device}', error: {e}");
                }
            };

            let duration = decoder.total_duration();

            sink.set_volume(volume);
            sink.append(decoder);

            match duration {
                Some(mut dur) => {
                    debug!("Will play sound '{file_path}' on device '{device}' for a duration of {} ms, at volume: {volume}", dur.as_millis());
                    dur = dur + Duration::from_millis(50);
                    thread::sleep(dur);
                    if sink.empty() {
                        info!("Played sound '{file_path}' on device '{device}' for {} ms, at volume: {volume}", dur.as_millis());
                    } else {
                        error!("The callout should have been done by now, will stop it, number of sounds left: {}", sink.len());
                        sink.stop();
                    }
                },
                None => {
                    warn!("Will play sound '{file_path}' on device '{device}'. Didn't find an duration, will shut it down after a maximum of {default_sound_duration} s");
                    sink.sleep_until_end(); 
                }
            }

            Ok(())
        });

        set.spawn(async move {
            tokio::time::sleep(tokio::time::Duration::from_millis(default_sound_duration)).await;
            bail!("Sound playing reach maximum duration of {default_sound_duration} s, will shut down tasks")
        });

        while let Some(res) = set.join_next().await {
            return res?;
        }
        Ok(())
    }
dvdsk commented 2 weeks ago

How often is this function called? If a lot of sinks are created that might explain the ALSA error.

I do not think the spawn_blocking code is actually aborted when the JoinSet is dropped. Its rather hidden in their docs:

Be aware that tasks spawned using spawn_blocking cannot be aborted because they are not async.

Still this function as is should never create that large a memory allocation. If it does then thats on rodio. Could you run your program using gdb (this might be a reasonable tutorial)?

FinitelyFailed commented 2 weeks ago

How often is this function called? If a lot of sinks are created that might explain the ALSA error.

At most at an 2 min interval. More commonly like every 10 minutes.

I do not think the spawn_blocking code is actually aborted when the JoinSet is dropped. Its rather hidden in their docs

Be aware that tasks spawned using spawn_blocking cannot be aborted because they are not async.

When you are using jointset, I think you can abort blocking, but I have to run some tests to actually see if it works.

Still this function as is should never create that large a memory allocation. If it does then thats on rodio. Could you run your program using gdb (this might be a reasonable tutorial)?

I'll try to set up gdb, the problem is that the error happens sporadic.

dvdsk commented 2 weeks ago

I think you can abort blocking, but I have to run some tests to actually see if it works.

Do run the test, however reading the source code of tokio it seems they can not be aborted. JoinSet::spawn_blocking(f) just calls runtime::spawn_blocking(f) just like task::spawn_blocking does. It then puts the JoinHandle returned by spawn blocking in a set.

Joinset is just a Set in which the JoinHandles are put with the added benefit that it calls abort on all the handles when the set drops

dvdsk commented 2 weeks ago

I'll try to set up gdb, the problem is that the error happens sporadic.

You can see if you can change something to your code to make it happen all the time. But that is really hard. At this point I would just rewrite it a bit and hope the error goes away.

FinitelyFailed commented 2 weeks ago

I think you can abort blocking, but I have to run some tests to actually see if it works.

Do run the test, however reading the source code of tokio it seems they can not be aborted. JoinSet::spawn_blocking(f) just calls runtime::spawn_blocking(f) just like task::spawn_blocking does. It then puts the JoinHandle returned by spawn blocking in a set.

Joinset is just a Set in which the JoinHandles are put with the added benefit that it calls abort on all the handles when the set drops

Thanks for pointing that out for me. I wrote a test and you are correct. It is just really weird that the tokio documentation is that contradictory ... I'll rewrite my code, to handle it differently.

FinitelyFailed commented 2 weeks ago

I managed to get some more logs via journalctl:

Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.014471931+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:133 - Successfully read file to play, path: '/data/gpt/fe/audio/83537078-9a4d-469f-b50f-e6dfd5876514', metadata: Metadata { file_type: FileType(FileType { mode: 33188 }), is_dir: false, is_file: true, permissions: Permissions(FilePermissions { mode: 33188 }), modified: Ok(SystemTime { tv_sec: 1724766468, tv_nsec: 45997281 }), accessed: Ok(SystemTime { tv_sec: 1724764557, tv_nsec: 193258474 }), created: Ok(SystemTime { tv_sec: 1723705373, tv_nsec: 10650076 }), .. }
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.014960579+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:144 - Successfully created decoder
Aug 27 14:11:08 raspberrypi4-64 device-service[335]: {"Timestamp":"2024-08-27T14:11:07.030546671Z","Level":"Information","RenderedMessage":"Successfully updated 'Flags'","Properties":{"BlockGid":"9041005900300633","ApplicationNetwork":"69366fd4-baa9-460a-b9ee-0e0d0f37b234","BlockRef":"9041005900300633","SessionId":"8b573bbd-92d4-4887-99cd-23789df05327","File":"libs/modules/inner_signage_module/src/data_handler.rs","VehicleRef":"9031005900306034","ApplicationName":"PassengerInformationService","DeviceId":"1d93d2c2-2173-4ba1-a457-02172efd15ae-gptedge","Target":"inner_signage_module::data_handler","VehicleId":"9031005900306034","JourneyRef":"9015005000300133","ModulePath":"inner_signage_module::data_handler","TripId":"55700000076131905","EventId":"5749357a-331a-4618-a6aa-85ccbfab845a","Line":"148","MACAddress":"e45f0126601d","TripGid":"9015005000300133","Hostname":"raspberrypi4-64"}}
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/oss/pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/oss/pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: ALSA lib ../../alsa-plugins-1.2.1/usb_stream/pcm_usb_stream.c:508:(_snd_pcm_usb_stream_open) Unknown field hint
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: memory allocation of 5404319552844632832 bytes failed
Aug 27 14:11:08 raspberrypi4-64 audit[9416]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.157546733+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:158 - Successfully created stream and device handle
Aug 27 14:11:08 raspberrypi4-64 myprogram[9416]: 2024-08-27T14:11:08.157714492+00:00 INFO libs/modules/callout_module/src/application_state/utils/sound_player.rs:169 - Successfully created sink
Aug 27 14:11:08 raspberrypi4-64 kernel: audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 kernel[222]: [43852.532479] audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 mosquitto[292]: 1724767868: Client myprograme45f0126601d closed its connection.
Aug 27 14:11:08 raspberrypi4-64 systemd[1]: myprogram: Main process exited, code=killed, status=6/ABRT
Aug 27 14:11:08 raspberrypi4-64 systemd[1]: myprogram: Failed with result 'signal'.

Which seems to point to cpal?

dvdsk commented 2 weeks ago

Which seems to point to cpal?

I see nothing suspicious here, could you point it out for me?

FinitelyFailed commented 2 weeks ago

These three lines:

Aug 27 14:11:08 raspberrypi4-64 audit[9416]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 kernel: audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1
Aug 27 14:11:08 raspberrypi4-64 kernel[222]: [43852.532479] audit: type=1701 audit(1724767868.152:328): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9416 comm="cpal_alsa_out" exe="/usr/bin/myprogram" sig=6 res=1

I do not know what "kernel: audit" means, but both lines mentions "comm="cpal_alsa_out".