Closed ymtoo closed 3 years ago
Thanks for pointing out this huge inefficiency! However, is always first allocating the samples
array in UInt64
and then bulk-converting it in the end to sample_type
really the best we can do here? Shouldn't sample_type
really be a type parameter of the function, such that the compiler can then specialize all of read_pcm_samples
for each value of sample_type
and dispatch to a function specialized for that type?
I think how this function deals with types needs a slightly bigger rethink.
I see that
function read_ieee_float_samples(io::IO, chunk_size,
fmt::WAVFormat, subrange,
::Type{floatType}) where {floatType}
does already use a type parameter floatType
, and I suspect read_pcm_samples
should do something similar. The call to pcm_container_type(nbits)
needs to move out of read_pcm_samples
such that the compiler can dispatch on the result. For performance critical code, always decide on types at compile type.
This PR is a workaround to avoid the large number of allocations with minimal changes to the function. I am not familiar with how the compiler works but agreed that a rework is required such that a method will be determined at compile time when calling the function with a type parameter.
Here is an alternative quick workaround (which makes sample_type
a constant rather than a variable):
diff --git a/src/WAV.jl b/src/WAV.jl
index a41a6f1..be99782 100644
--- a/src/WAV.jl
+++ b/src/WAV.jl
@@ -313,13 +313,13 @@ end
ieee_float_container_type(nbits) = (nbits == 32 ? Float32 : (nbits == 64 ? Float64 : error("$nbits bits is not supported for WAVE_FORMAT_IEEE_FLOAT.")))
-function read_pcm_samples(io::IO, chunk_size, fmt::WAVFormat, subrange)
+function read_pcm_samples(io::IO, chunk_size, fmt::WAVFormat, subrange,
+ ::Type{sample_type}) where {sample_type}
nbits = bits_per_sample(fmt)
if isempty(subrange)
- return Array{pcm_container_type(nbits), 2}(undef, 0, fmt.nchannels)
+ return Array{sample_type, 2}(undef, 0, fmt.nchannels)
end
- samples = Array{pcm_container_type(nbits), 2}(undef, length(subrange), fmt.nchannels)
- sample_type = eltype(samples)
+ samples = Array{sample_type, 2}(undef, length(subrange), fmt.nchannels)
nbytes = ceil(Integer, nbits / 8)
bitshift = [0x0, 0x8, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40]
mask = UInt64(0x1) << (nbits - 1)
@@ -605,7 +605,8 @@ function read_data(io::IO, chunk_size, fmt::WAVFormat, format, subrange)
subrange = 1:convert(UInt, chunk_size / fmt.block_align)
end
if isformat(fmt, WAVE_FORMAT_PCM)
- samples = read_pcm_samples(io, chunk_size, fmt, subrange)
+ samples = read_pcm_samples(io, chunk_size, fmt, subrange,
+ pcm_container_type(bits_per_sample(fmt)))
convert_to_double = x -> convert_pcm_to_double(x, bits_per_sample(fmt))
elseif isformat(fmt, WAVE_FORMAT_IEEE_FLOAT)
samples = read_ieee_float_samples(io, chunk_size, fmt, subrange)
(But the more I look at this function, the more I develop an urge to rewrite it from scratch at some point ...)
Your workaround on the type stability of read_pcm_samples
works better!
@benchmark wavread(joinpath(path, "test.wav"); format="native")
BenchmarkTools.Trial:
memory estimate: 38.15 MiB
allocs estimate: 53
--------------
minimum time: 82.005 ms (0.00% GC)
median time: 83.332 ms (0.38% GC)
mean time: 86.543 ms (3.02% GC)
maximum time: 155.182 ms (43.36% GC)
--------------
samples: 58
evals/sample: 1
Use broadcasting to reduce number of allocations.
Before:
After: