analysiscenter / cardio

CardIO is a library for data science research of heart signals
https://analysiscenter.github.io/cardio/
Apache License 2.0
247 stars 78 forks source link

ECG segmentation perfomance #37

Closed leanderme closed 4 years ago

leanderme commented 4 years ago

Hi, thank you for sharing this!

I've successfully trained the hmm model for the segmentation task. Feeding it with the demo data provided outputs satisfying results. When using different unseen data, it does not though.

I tried to do the segmentation with the Physionet 2016 dataset . Each record contains a heart sound recording and the respective ecg signal. Therefore, I've extracted the ecg first:

filename = "a0001"
record = wfdb.rdrecord('./tests/data/' + filename)
signals, fields = wfdb.rdsamp('./tests/data/' + name, channels=[1])

# Write a wfdb header file and any associated dat files from this object.
wfdb.wrsamp(
  'modified_' + filename,
  fs = fields['fs'],
  units=['mV'],
  sig_name=['ECG'],
  p_signal=signals,
  write_dir='./tests/data/',
  fmt=['16']
)

The prediction is done by:

ECG_MASK = "../cardio/tests/data/modified*.hea"
MODEL_PATH = "./model_dump.dill" 
config_predict = { 'build': False, 'load': {'path': "./model_dump.dill"}}

index = bf.FilesIndex(path="./tests/data/modified*.hea", no_ext=True, sort=True)
dtst = bf.Dataset(index, batch_class=EcgBatch)

pipeline = dtst >> hmm_predict_pipeline(MODEL_PATH, annot="hmm_annotation")
batch = pipeline.next_batch()

for idx, val in enumerate(dtst.indices):
  batch.show_ecg(batch.indices[idx], 0, 20, "hmm_annotation")

But the model output seems to be almost random:

Screenshot 2020-02-24 at 00 19 49

My question is: Did you experience similar results? Do you have any perfomance metrics for the segmentation model? Am I missing something here (like preprocessing)?

Any help is greatly appreciated.

leanderme commented 4 years ago

Sorry, forgot to attach the model. It has been trained with the (default) parameters suggested in your detailed tutorial.

model_dump.dill.zip

dpodvyaznikov commented 4 years ago

Hi, @leanderme !

Sorry for the long response.

I'd suggest to try running inference on the validation set of the same dataset you've trained the model on. Results should be reasonable - this way you'd verify that the model has converged to a reasonable state and the problem is probably in the data.

From what I can see in you screenshot, your data has a non-zero baseline and much higher amplitudes than in training data. It is possible that subtracting mean and scaling amplitudes may lead to better performance.

dpodvyaznikov commented 4 years ago

I'll close this issue for now.