RustAudio / lewton

Rust vorbis decoder
Other
259 stars 26 forks source link

Match speed of libvorbis #2

Open est31 opened 8 years ago

est31 commented 8 years ago

atm we are two times slower than libvorbis. We need to be at least as fast as them.

est31 commented 8 years ago

Maybe there is some improvement doable in huffman tree decoding? No idea.

est31 commented 7 years ago

Note about current speed: it ranges between 1.6x and 1.8x for floor1 files, and its faster for floor0 files, but those don't really matter as there are almost no files with floor 0.

est31 commented 7 years ago

And part of the speed improvement was thanks to changes between rust 1.11 and 1.12 compilers.

est31 commented 7 years ago

Wow, seems recent changes in rustc have lead to some serious speed improvement. As of rust nightly compiler 2016-10-18, lewton is only 1.18 to 1.25 as slow as libvorbis.

est31 commented 7 years ago

(note: I'm always comparing the "Overall ratio of difference" output of cargo run --release bench of the cmp tool).

est31 commented 7 years ago

mhh, seems it has the same performance on Rust 1.12.1, so its caused by something else? No idea. Either way, its really good.

est31 commented 7 years ago

As of rustc 1.19.0-nightly (f4209651e 2017-05-05), the factor is around 1.09 to 1.12.

est31 commented 7 years ago

With rustc 1.21.0-nightly (2aeb5930f 2017-08-25), the factor is between 1.05 and 1.06.

ashthespy commented 5 years ago

Have there been some recent regressions? I was curious so ran the comparison with rustc 1.30.0 (da5f414c2 2018-10-24) and the latest master (0.9.3):

$ cargo run --release bench
    Finished release [optimized] target(s) in 0.58s
     Running `target/release/cmp bench`

Comparing speed for bwv_1043_vivace.ogg : libvorbis=0.6495s we=0.8464s difference=1.30x
Comparing speed for bwv_543_fuge.ogg    : libvorbis=0.9369s we=1.3493s difference=1.44x
Comparing speed for maple_leaf_rag.ogg  : libvorbis=0.2593s we=0.3801s difference=1.47x
Comparing speed for hoelle_rache.ogg    : libvorbis=0.4680s we=0.6724s difference=1.44x
Comparing speed for thingy-floor0.ogg   : libvorbis=0.2157s we=0.2524s difference=1.17x

Overall time spent for decoding by libvorbis: 2.5293s
Overall time spent for decoding by us: 3.5007s
Overall ratio of difference: 1.38x
est31 commented 5 years ago

@ashthespy I'm not sure where this comes from. This slow behaviour happens on rustc 1.20 stable taken from rustup as well, so it isn't a regression of rustc itself or of llvm. It might be some improvement in how gcc optimizes: libvorbis is usually taken from the OS so it's compiled via your OS compiler, which is usually gcc, while lewton is compiled using rustc + llvm. To get a fair comparison, one would have to compare to clang of the same version that the rustc is coming from.

fdoyon commented 5 years ago

Most of the performance delta is due to the transient Vec and SmallVec allocations, realloc, and drops. Here is a trace you can open with Instrument on MacOS. alloc trace.trace.zip

Please see my comments on the allocation issue regarding the need for an API and design change to solve this issue efficiently.