MIT-LCP / physionet

A collection of tools for working with the PhysioNet repository.
http://physionet.org/
MIT License
69 stars 17 forks source link

parsecsp fails to read uncompressed data #120

Open thewyrdguy opened 5 years ago

thewyrdguy commented 5 years ago

I have a device that stores uncompressed data in section 6 (4500 samples in 9000 bytes). Parsescp fails to parse such files, as it tries to unconditionally apply Huffman decompression algorithm. This results in errors such as this:

Warning: 63006 extra bits in lead 0

I have no patch because I have no access to the spec, and do not know how to check if compression was used or not.

(could someone give me a pointer to the spec?)

alistairewj commented 5 years ago

I assume you mean WFDB? There is a GitHub for that one: https://github.com/bemoody/wfdb

parsescp: https://github.com/bemoody/wfdb/blob/master/convert/parsescp.c

bemoody commented 5 years ago

Is this a separate issue from https://github.com/MIT-LCP/physionet/issues/119 ?

Do you have any anonymized example data files that you could share?

I found one version of the spec here: https://web.archive.org/web/20131219073923/http://www.tc251wgiv.nhs.uk/pages/pdf/censcp019.pdf

but again, I know very little about this so take my word with a grain of salt.

thewyrdguy commented 5 years ago

@alistairewj thank you! It is kind of difficult to find the canonical place... Physionet website points here. I guess I'll leave the tickets here to not lose the thread.

@bemoody #119 and #120 are different. I do not know anything about anything, but I assume that the "Compression type" field may relate to what signal processing crowd calls "compression", i.e. the "dynamic range compression" applied to each sample individually. I guess that Huffman compression may be defined by section 2, and indeed the table in my file looks peculiar:

[1,0,0,16,1,0,0,0,8,0,0,0] -- decimal bytes, first two being the "number of code structures"

I am attaching a sample file. It's produced by a (cheap household single lead) ECG recorder from Heal Force.

a.zip

Thanks for the link to the doc!

thewyrdguy commented 5 years ago

Some corrections and expansions to my previous comment.

First, "compression type" in section 6 is not "dynamic range compression" as I guessed, but some weird mode when some parts of the record have different sample resolution from other parts. The document is a bit foggy there.

Second, Huffman table that I observe in my samples is a special "dummy Huffman table" as defined in the note after section 5.9.4 of the document. It's a single table with a single code structure in which the number of bits in prefix is 0 and the number of bits in the code (16) represents the sample size:

SCPHufftabs {
  scpHuffTablesNum = 1, 
  scpHuffTables = [
    SCPHuffTable {
      scpHuffCodeStructsNum = 1, 
      scpHuffCodeStructs = [
        SCPHuffCodeStruct {
          scpHuffCodeBitsInPrefix = 0, 
          scpHuffCodeBitsInCode = 16, 
          scpHuffCodeModeSwitch = 1, 
          scpHuffCodeBaseValue = 0, 
          scpHuffCodeBaseCode = 2048
        }
      ]
    }
  ]
}