avoid decoding sample data as Ints

beacon-biosignals / Onda.jl

A Julia package for high-throughput manipulation of structured signal data across arbitrary domain-specific encodings, file formats and storage layers

Other

67 stars 5 forks source link

avoid decoding sample data as Ints #99

Closed kolia closed 1 year ago

kolia commented 3 years ago

If the encoding offset and resolution are ~Ints~ 0.0 and 1.0 respectively, the decoded samples.data can have eltype Int, which is not friendly for downstream uses of the data including DSP.

This could be avoided by either converting decoded data to have Float64 eltype, or made slightly less awkward to handle by implementing convert(Samples{D}, samples).

jrevels commented 3 years ago

In what situation does this arise? if you want the decoded representation to be Float64, you should use Float64 decoding parameters?

It's generally good style in Julia to avoid avoidable promotion/widening, and instead respect the callers' choice of input types as much as possible. To do otherwise is to leak an implementation detail and can be far more annoying to workaround in practice (e.g. the classic CuArrays <-> Float32 autoconversion choice that resulted in lots of developer pain).

kolia commented 3 years ago

This can arise because of https://github.com/beacon-biosignals/Onda.jl/blob/master/src/samples.jl#L352 if sample_resolution_in_unit is 1.0 and sample_offset_in_unit is 0.0, then the decoded sample eltype will be Ints instead of Float64 for all other cases, since resolution and offset are both Float64s.

Maybe a good fix is to make that line convert decoded data to Float64.

jrevels commented 3 years ago

Ah. Yeah, I think it'd be okay to have the result in that case be promoted

Maybe a good fix is to make that line convert decoded data to Float64.

ehhhh it should converted to whatever the promotion should be based on the arguments, not to Float64

(and thus the aliasing optimization is still preserved in the case where the types already match)

kolia commented 3 years ago

Ok, I was wondering which of convert.(Float64, sample_data) or sample_data .* 1.0 .+ 0.0 would be faster, and they turn out to be about the same.

using BenchmarkTools
x = rand(Int16, 200 * 3600 * 8)   # 6-channel 8 hour recording at 200Hz
@btime $x .* 1.0 .+ 0.0
@btime convert.(Float64, sample_data)

both give ~180ms and 2 allocations over 263.67Mb.

Same times and allocations for x of eltype Float32.

So maybe best is to just special-case (1.0, 0.0) and return the input sample_data when promote_type(typeof(sample_offset_in_unit), eltype(sample_data)) == eltype(sample_data), and otherwise do the mult/add.

jrevels commented 3 years ago

promote_type(typeof(sample_offset_in_unit), eltype(sample_data)) == eltype(sample_data)

yeah the patch I'd apply is this:

diff --git a/src/samples.jl b/src/samples.jl
index 4c806eb..a91cf0f 100644
--- a/src/samples.jl
+++ b/src/samples.jl
@@ -343,13 +343,17 @@ If:

     sample_data isa AbstractArray &&
     sample_resolution_in_unit == 1 &&
-    sample_offset_in_unit == 0
+    sample_offset_in_unit == 0 &&
+    promote_type(typeof(sample_resolution_in_unit), typeof(sample_offset_in_unit)) == eltype(sample_data)

 then this function is the identity and will return `sample_data` directly without copying.
 """
 function decode(sample_resolution_in_unit, sample_offset_in_unit, sample_data)
-    if sample_data isa AbstractArray
-        sample_resolution_in_unit == 1 && sample_offset_in_unit == 0 && return sample_data
+    if (sample_data isa AbstractArray &&
+        sample_resolution_in_unit == 1 &&
+        sample_offset_in_unit == 0 &&
+        promote_type(typeof(sample_resolution_in_unit), typeof(sample_offset_in_unit)) == eltype(sample_data))
+        return sample_data
     end
     return sample_resolution_in_unit .* sample_data .+ sample_offset_in_unit
 end

jrevels commented 1 year ago

closed by #133 ; decode will now decode to Float64 by default, but you can pass in an extra type parameter to steer it