Closed emoon closed 11 months ago
Hi, thanks for the great bug report! I could easily see what went wrong and reproduce the issue.
The problem was a bit tricky to investigate and fix, as it boiled down to libvorbis
using the dangerous alloca
function to allocate a buffer at least sample_count * 4
bytes big on the stack. In your example code, sample_count
is 48000 * 100, which translates to libvorbis
allocating a whopping 18.31 MiB on the stack! For context, the default stack size Rust sets for threads it spawns is usually 2 MiB, and stacks larger than 8 MiB are rarely seen, so this allocation fails and the stack overflows, eventually triggering a SIGSEGV.
The possible solutions are to modify libvorbis
to call malloc
/realloc
instead, or to make sure to allocate a big enough stack on the heap before calling the pesky alloca
(c.f. the stacker
crate). For reasons explained in the description of the linked commit and the related libvorbis
submodule commit, I chose the first approach.
Please let me know if the fix works for you or if you have anything else to add :smile:
Edit: minor wording tweaks according to a (now deleted) comment by @emoon, thanks!
Related upstream PR: https://github.com/xiph/vorbis/pull/104
If the issue isn't fixed in vorbis it's also possible inside the Rust code to do a similar workaround to what I did in my own code (i.e you send smaller chunks of the buffer over to the C code instead)
Yes, that's also a valid option, although the exact maximum buffer size is platform-dependent and thus difficult to calculate correctly. Given that vorbis_rs
uses a custom libvorbis
fork anyway, it's not a maintenance burden to patch it for the time being.
Wanted to make an update about this here. So I have tested with the latest code and it does indeed fix the crash, but there is another problem with it. When I use my old code where I send chunks of samples (in my case 48k for each channel) to audio_encode_block
encoding the song takes about 3 sec (which includes some other stuff also, but that isn't important here)
When I use the latest code and send the whole buffer it takes about 150 sec (!) to do the encoding now. So almost 50x longer. Now I'm sure it's not the fault of the Rust crate here, but this is something users may run into if they also send in a large buffer and have no idea what to expect of the encoding time.
Great catch! I attached a profiler to two runs of the encoder over a 5 minutes, mono, 44.1 kHz song, the only difference being that in one run I chunked the input in blocks of 65536 samples, while in the other I handed off the entire song to the encoder in a single block. The results were as follows:
As it stands out from the results of the profiler, the slowdown in the non-chunked case is entirely due to a very slow implementation of vorbis_analysis_blockout
, which spends a non-negligible amount of time copying memory buffers around. The encoding itself, done by the vorbis_analysis
function, is responsible for only about 2.7% of the runtime.
In stark contrast, when chunking the input into smaller blocks, the encoding is responsible for most CPU cycles, even though vorbis_analysis_blockout
takes a non-negligible 14% of the total execution time.
In any case, there is not anything this Rust crate can do about the slowdown, other than trying to come up with some libvorbis patch to improve the performance of vorbis_analysis_blockout
.
For now, I've added the following note to the encode_audio_block
method documentation to make users aware of the tradeoffs between different block sizes, and hopefully guide them to choose the block size that works best for their application (I'm a bit reluctant to do any block size manipulation in this crate, as the crate doesn't have information about the entire use case to choose the optimal block size, and I don't want to introduce "hidden" chunking or buffering costs):
Although the Vorbis encoder is meant to support blocks of any size, there are some performance considerations that users should bear in mind when choosing an optimal block size for their application:
- Blocks that are too small (as a rough guideline, between one and 16 samples in size) result in more CPU overhead, which translates into slower execution times due to more repetitions of the per-block encoding setup logic. However, Vorbis is quite well optimized to handle small block sizes: in practice, no slowdowns greater than 2x have been observed even when using a single sample per channel and block, but your mileage may vary.
- Too large blocks (e.g., from 218 = 262144 samples) have apparently not been tested much, and lead to a sharp degradation of the encoding runtime and much higher maximum memory usage as the block size increases. Slowdowns of several orders of magnitude have been observed when encoding minutes of audio as a single block.
As a rule of thumb, a pretty good block size is at most a few seconds of audio, or no orders of magnitude larger than the maximum Vorbis encoding window size, 213 = 8192, and does not require your application to do any chunking (i.e., splitting of larger blocks into smaller ones) or buffering (i.e., combining smaller blocks into larger ones). When in doubt, use smaller block sizes. The libvorbis documentation states that "1024 is a reasonable choice" for a block size.
Hi,
First of all thanks for a great crate.
I'm running into a crash when I'm trying to encode a "large" buffer. Here is a small repro case for it.
Callstack
I can workaround this issue by manually sending smaller chunks of the data to
encode_audio_block
but I don't see any limitation in the API/docs that would indicate that I have to do that.