analysiscenter / cardio

CardIO is a library for data science research of heart signals
https://analysiscenter.github.io/cardio/
Apache License 2.0
247 stars 78 forks source link

how can i load split ECG data #31

Closed gsm2055 closed 5 years ago

gsm2055 commented 5 years ago

I'm sorry for asking so many questions. But I haven't been able to solve the problem all night. Here's the question:

  1. How can I use only one ECG channel not 2-lead of ECG1 and ECG2 in qt-database?

  2. The meta data contains all the qrs information, but according to the survey, there are about 3,700 annotation data entered by experts. Do you use all Meta data in cardio when do training? Or do you use only 3,700 data?

  3. When I train HMM algorithm, I always have to load all the data, so how can I split and load only the data I want? I would like to load only data that has an expert annotation.

I'm looking at the subfiles of the all tutorial and all the Cardio .py files. But I'm frustrated with the continuing problem.

I'd really appreciate it if you could give me a little more help.

best regard Endrew

dpodvyaznikov commented 5 years ago

Hi!

  1. You can drop any channel with drop_channels method of EcgBatch.
  2. In our publication and pipelines we use automated annotations, those with extension pu1. Expert annotations are those with q1c and q2c extensions. You can use them by passing corresponding extension to ann_ext parameter of load method, e.g.:
    load(fmt='wfdb', components=["signal", "annotation", "meta"], ann_ext='q1c')
  3. If you want to load only a part of the data, you need to create a FilesIndex or DatasetIndex with indices of the files you need. For example, if you want to work only with files that have annotation from second expert (files with q2c extension), you can create FilesIndex in a following way:
    FilesIndex(path='some_path/qt/*.q2c', no_ext=True)

    Here you can find more information about Index objects inbatchflow. Also, you may want to read docstrings for FilesIndex and DatasetIndex methods.

There also is a simple and straightforward way, which is to move files you need to a separate folder and use FilesIndex with corresponding path.

gsm2055 commented 5 years ago

If I use this code, I always have 225000 lenth entries. How can I get the only 30 to 50 annotation data contained in q1c?

i am trying to get using csv file. i convert dat to csv . but it's doen't work because when i use csv, Cardio error because it can't find another lead channel..

and.. I found about "drop_channel" method because you told me, but this method requires name and indices But I don't know how to use only ECG 1-lead using this method. ECG have different names, such as MLII and V5, so how do I use only one channel?

I've been spending about 50 days on this problem. Thanks to you, I think i am closer to the problem. Thank you.

SIGNALS_PATH = "C:/Users/gsm20/Downloads/qt-database-1.0.0" SIGNALS_MASK = os.path.join(SIGNALS_PATH, "*.q1c")

index = bf.FilesIndex(path=SIGNALS_MASK, no_ext=True, sort=True) dtst = bf.Dataset(index, batch_class=EcgBatch) dtst.split()

config_predict = { 'build': False, 'load': {'path': "C:/Users/gsm20/Downloads/train_dill/train_sel&sele_q1c/hmmodel1_q1c.dill"} }

template_ppl_predict = ( bf.Pipeline() .init_model("static", HMModel, "HMM", config=config_predict) .load(fmt="wfdb", components=["signal", "annotation", "meta"], ann_ext="q1c") .cwt(src="signal", dst="hmm_features", scales=[4,8,16], wavelet="mexh") .standardize(axis=-1, src="hmm_features", dst="hmm_features") .predict_model("HMM", make_data=partial(prepare_hmm_input, features="hmm_features", channel_ix=0), save_to=bf.B("hmm_annotation"), mode='w') .calc_ecg_parameters(src="hmm_annotation") .run(batch_size=20, shuffle=False, drop_last=False, n_epochs=1, lazy=True) )

ppl_predict = (dtst >> template_ppl_predict)

batch = ppl_predict.next_batch()

dpodvyaznikov commented 5 years ago

This is how wfdb format works - despite there may be only a few heartbeats annotated, the annotation file is created for the whole signal.

CardIO provides some build-in features and has basic tools that help you build up your own features. It cannot do everything you want out-of-the-box. If you want to have some fancy processing, e.g. select parts of the signal with regard to annotation, you need to write your own code.

Regarding the drop_channels method: if you don't know channel name, you can simply use indices=0 to drop first channel, indices=1 to drop second, etc. Or you can take a look at the meta component of the batch, it should contain channel names along with other useful information. There also is keep_channels method that keep only channels which names or indices you've provided.

And I strongly recommend you to thoroughly go through CardIO documentation. It is a good practice to learn about library methods by reading the docs.

I'll close this issue as long as the problem you've described does not have anything to do with CardIO.