improve read_pcm_samples

ymtoo commented 3 years ago

Use broadcasting to reduce number of allocations.

using BenchmarkTools, WAV

path = mktempdir()
y = trunc.(Int16, 10000 .* sin.((0:9999999)/48000*2pi*440))
wavwrite(y, joinpath(path, "test.wav"), Fs=48000)

Before:

julia> @benchmark wavread(joinpath(path, "test.wav"); format="native")
BenchmarkTools.Trial: 
  memory estimate:  486.24 MiB
  allocs estimate:  29366204
  --------------
  minimum time:     1.229 s (1.05% GC)
  median time:      1.321 s (5.87% GC)
  mean time:        1.324 s (5.91% GC)
  maximum time:     1.426 s (10.15% GC)
  --------------
  samples:          4
  evals/sample:     1

After:

@benchmark wavread(joinpath(path, "test.wav"); format="native")
BenchmarkTools.Trial: 
  memory estimate:  114.44 MiB
  allocs estimate:  58
  --------------
  minimum time:     114.778 ms (0.18% GC)
  median time:      119.207 ms (3.60% GC)
  mean time:        125.500 ms (6.17% GC)
  maximum time:     188.214 ms (35.16% GC)
  --------------
  samples:          40
  evals/sample:     1

coveralls commented 3 years ago

Coverage decreased (-0.07%) to 68.352% when pulling 634312ca5f4b1c39eb42fd3a81b3afd56d8beaef on ymtoo:patch-1 into 249b0723e08a98680d4433e3946da1c09aee52d2 on dancasimiro:main.

mgkuhn commented 3 years ago

Thanks for pointing out this huge inefficiency! However, is always first allocating the samples array in UInt64 and then bulk-converting it in the end to sample_type really the best we can do here? Shouldn't sample_type really be a type parameter of the function, such that the compiler can then specialize all of read_pcm_samples for each value of sample_type and dispatch to a function specialized for that type?

I think how this function deals with types needs a slightly bigger rethink.

mgkuhn commented 3 years ago

I see that

function read_ieee_float_samples(io::IO, chunk_size,
                                 fmt::WAVFormat, subrange,
                                 ::Type{floatType}) where {floatType}

does already use a type parameter floatType, and I suspect read_pcm_samples should do something similar. The call to pcm_container_type(nbits) needs to move out of read_pcm_samples such that the compiler can dispatch on the result. For performance critical code, always decide on types at compile type.

ymtoo commented 3 years ago

This PR is a workaround to avoid the large number of allocations with minimal changes to the function. I am not familiar with how the compiler works but agreed that a rework is required such that a method will be determined at compile time when calling the function with a type parameter.

mgkuhn commented 3 years ago

Here is an alternative quick workaround (which makes sample_type a constant rather than a variable):

diff --git a/src/WAV.jl b/src/WAV.jl
index a41a6f1..be99782 100644
--- a/src/WAV.jl
+++ b/src/WAV.jl
@@ -313,13 +313,13 @@ end

 ieee_float_container_type(nbits) = (nbits == 32 ? Float32 : (nbits == 64 ? Float64 : error("$nbits bits is not supported for WAVE_FORMAT_IEEE_FLOAT.")))

-function read_pcm_samples(io::IO, chunk_size, fmt::WAVFormat, subrange)
+function read_pcm_samples(io::IO, chunk_size, fmt::WAVFormat, subrange,
+                          ::Type{sample_type}) where {sample_type}
     nbits = bits_per_sample(fmt)
     if isempty(subrange)
-        return Array{pcm_container_type(nbits), 2}(undef, 0, fmt.nchannels)
+        return Array{sample_type, 2}(undef, 0, fmt.nchannels)
     end
-    samples = Array{pcm_container_type(nbits), 2}(undef, length(subrange), fmt.nchannels)
-    sample_type = eltype(samples)
+    samples = Array{sample_type, 2}(undef, length(subrange), fmt.nchannels)
     nbytes = ceil(Integer, nbits / 8)
     bitshift = [0x0, 0x8, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40]
     mask = UInt64(0x1) << (nbits - 1)
@@ -605,7 +605,8 @@ function read_data(io::IO, chunk_size, fmt::WAVFormat, format, subrange)
         subrange = 1:convert(UInt, chunk_size / fmt.block_align)
     end
     if isformat(fmt, WAVE_FORMAT_PCM)
-        samples = read_pcm_samples(io, chunk_size, fmt, subrange)
+        samples = read_pcm_samples(io, chunk_size, fmt, subrange,
+                                   pcm_container_type(bits_per_sample(fmt)))
         convert_to_double = x -> convert_pcm_to_double(x, bits_per_sample(fmt))
     elseif isformat(fmt, WAVE_FORMAT_IEEE_FLOAT)
         samples = read_ieee_float_samples(io, chunk_size, fmt, subrange)

(But the more I look at this function, the more I develop an urge to rewrite it from scratch at some point ...)

ymtoo commented 3 years ago

Your workaround on the type stability of read_pcm_samples works better!

@benchmark wavread(joinpath(path, "test.wav"); format="native")
BenchmarkTools.Trial: 
  memory estimate:  38.15 MiB
  allocs estimate:  53
  --------------
  minimum time:     82.005 ms (0.00% GC)
  median time:      83.332 ms (0.38% GC)
  mean time:        86.543 ms (3.02% GC)
  maximum time:     155.182 ms (43.36% GC)
  --------------
  samples:          58
  evals/sample:     1

dancasimiro / WAV.jl

improve read_pcm_samples #95