beacon-biosignals / OndaEDF.jl

utilities for importing/exporting EDF Files to/from Onda datasets
Other
3 stars 2 forks source link

cannot export Onda Samples with `sample_type = UInt16` #75

Closed kleinschmidt closed 1 year ago

kleinschmidt commented 1 year ago

MWE:

julia> using Onda, OndaEDF

julia> info = SamplesInfoV2(; sensor_type="x", channels=["x"], sample_unit="microvolt", sample_resolution_in_unit=2.0, sample_offset_in_unit=1.0, sample_type="uint16", sample_rate=100)
SamplesInfoV2: (sensor_type = "x", channels = ["x"], sample_unit = "microvolt", sample_resolution_in_unit = 2.0, sample_offset_in_unit = 1.0, sample_type = "uint16", sample_rate = 100.0)

julia> data = rand(UInt16, 1, 100);

julia> samples = Samples(data, info, true)
Samples (00:00:01.000000000):
  info.sensor_type: "x"
  info.channels: ["x"]
  info.sample_unit: "microvolt"
  info.sample_resolution_in_unit: 2.0
  info.sample_offset_in_unit: 1.0
  sample_type(info): UInt16
  info.sample_rate: 100.0 Hz
  encoded: true
  data:
1×100 Matrix{UInt16}:
 0xa063  0xeac6  0x5196  0x4e78  0x7126  0x2f30  0xe677  0x5f01  0xd096  0x9273  …  0xb6fb  0x4672  0x9963  0xef0e  0xbeec  0x621b  0xe988  0x0416  0x436d

julia> onda_to_edf([samples])
ERROR: MethodError: Cannot `convert` an object of type 
  EDF.Signal{UInt16} to an object of type 
  Union{EDF.AnnotationsSignal, EDF.Signal{Int16}}

Closest candidates are:
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:64

Stacktrace:
 [1] push!(a::Vector{Union{EDF.AnnotationsSignal, EDF.Signal{Int16}}}, item::EDF.Signal{UInt16})
   @ Base ./array.jl:1060
 [2] onda_samples_to_edf_signals(onda_samples::Vector{Samples{Matrix{UInt16}}}, seconds_per_record::Float64)
   @ OndaEDF ~/work/OndaEDF.jl/src/export_edf.jl:124
 [3] onda_to_edf(samples::Vector{Samples{Matrix{UInt16}}}, annotations::Vector{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ OndaEDF ~/work/OndaEDF.jl/src/export_edf.jl:156
 [4] onda_to_edf
   @ ~/work/OndaEDF.jl/src/export_edf.jl:154 [inlined]
 [5] onda_to_edf(samples::Vector{Samples{Matrix{UInt16}}})
   @ OndaEDF ~/work/OndaEDF.jl/src/export_edf.jl:154
 [6] top-level scope
   @ REPL[49]:1

From digging around a bit in export, I think I have a hunch about what happened here, which is that when EDF added support for BDF (which use Int24 storage), the signal data field of hte EDF.Signal struct was made parametric on the eltype, and the implicit convert(Vector{Int16}, data) behavior in the old constructor we were relying on no longer happens.

kleinschmidt commented 1 year ago

As far as I can tell, we've never really tested for export with UInt sample types, despite the fact that these have been supported by Onda at least since 0.11...https://github.com/beacon-biosignals/Onda.jl/blob/8b97a05e12be68a95d6069be2d25d6db2db1a913/src/samples.jl#L162C89-L162C89

kleinschmidt commented 1 year ago

The rabbit hole goes deeper: the resolution scaling that's applied to any sample type wider than 16 bits is wrong. It makes the resolution bigger by a factor of sizeof(sample_type) / sizeof(Int16), which for Int32 is 2x. But the number of representable values is actually different by a factor of 2^16 (exponential scaling, not linear, in the number of bits/bytes).

So, what this means is that exporting signals with extreme values that use wider encoding will just silently clip those values.

I think using things like Int32 for encoding is pretty rare in practice, but I do think we should try to handle this more correctly. I'm not really sure what the right thing to do is here though...look at the actual values to choose the resolution? just annihilate the resolution by >1000x and maybe warn the user?

a-cakir commented 1 year ago

...the resolution scaling that's applied to any sample type wider than 16 bits is wrong....

Does this meant this should actually just be Int16? Currently, we are accepting Int16, Int32, and Int64 as is

kleinschmidt commented 1 year ago

It's wrong only inasmuch as extreme values won't be properly represented; you'd need to scale resolution by a factor of 256^2 instead of 2. But we're talking really extreme values here; in most cases, it will be fine (and the extreme values will be clipped). However if you have a signal with a tiiiiny resolution that's using all the dynamic range of the Int32 or Int64 type, then you could get into trouble.