MousaviSajad / ECG-Heartbeat-Classification-seq2seq-model

Inter- and intra- patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach
Other
179 stars 54 forks source link

data read #2

Open banchuangliangyue opened 5 years ago

banchuangliangyue commented 5 years ago

First of all, thank you very much for your code! When I read your code, I found the process of generating ECG sequence in the original code is as follows. Firstly, the heartbeat segmented by ECG records in the training set is stored into a two-dimensional array, then the heartbeats of each kind (one of N S V) are selected and shuffled to get three smaller two-dimensional arrays. Next, the three two-dimensional arrays are concatenated by rows to get a larger two-dimensional array. Finally, the array is split into a sequence every max_time(the default is 10) line. So I would like to ask why you want to group all the heartbeats according to categories in advance? If you do this, the labels of each sequence are almost the same, and the heartbeats contained in each sequence are not split from the same record at all (due to shuffle). This seems very different from the actual situation, where the sequences should be composed of heartbeats from the same ECG record. I wonder if this is a hidden trick of data processing? Why do we do that? Thank you very much for your reply! Here are some of the code that puzzles me. data = np.asarray(data) shape_v = data.shape data = np.reshape(data, [shape_v[0], -1]) t_lables = np.array(t_lables) _data = np.asarray([],dtype=np.float64).reshape(0,shape_v[1]) _labels = np.asarray([],dtype=np.dtype('|S1')).reshape(0,) for cl in classes: _label = np.where(t_lables == cl) permute = np.random.permutation(len(_label[0])) _label = _label[0][permute[:max_nlabel]] _data = np.concatenate((_data, data[_label])) _labels = np.concatenate((_labels, t_lables[_label]))

data = _data[:(len(_data)// max_time) * max_time, :]
_labels = _labels[:(len(_data) // max_time) * max_time]
data = [data[i:i + max_time] for i in range(0, len(data), max_time)]
labels = [_labels[i:i + max_time] for i in range(0, len(_labels), max_time)]
permute = np.random.permutation(len(labels))
data = np.asarray(data)
labels = np.asarray(labels)
data= data[permute]
labels = labels[permute]
banchuangliangyue commented 5 years ago

@SajadMo

MousaviSajad commented 5 years ago

Hello,

Many thanks for your email and sorry for the late response. Your point is correct " the sequences should be composed of heartbeats from the same ECG record". Unfortunately, I had not updated the code on the Github, I will do it by tomorrow.

Regards, Sajad

On Fri, Aug 23, 2019 at 6:01 AM banchuangliangyue notifications@github.com wrote:

@SajadMo https://github.com/SajadMo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SajadMo/ECG-Heartbeat-Classification-seq2seq-model/issues/2?email_source=notifications&email_token=AD4D7SO6ANDVNV3ITALCYJDQF7NTVA5CNFSM4IO7KWA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AERCY#issuecomment-524306571, or mute the thread https://github.com/notifications/unsubscribe-auth/AD4D7SJUNHDQQAS5SVADV73QF7NTVANCNFSM4IO7KWAQ .

banchuangliangyue commented 5 years ago

Thanks for your reply. I am looking forward to your updated code!

jrvmalik commented 4 years ago

@banchuangliangyue is correct, both the training and testing sequences contain one label type only. image

MousaviSajad commented 4 years ago

Yes, it is possible. It depends on the sequence length. If you increase the length, you might get more than one type label in a sequence.

Regards, Sajad

On Tue, Dec 31, 2019 at 12:29 PM John Malik notifications@github.com wrote:

@banchuangliangyue https://github.com/banchuangliangyue is correct, both the training and testing sequences contain one label type only. [image: image] https://user-images.githubusercontent.com/40525523/71631948-b69f1d00-2bd9-11ea-8734-fb2af04d3627.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SajadMo/ECG-Heartbeat-Classification-seq2seq-model/issues/2?email_source=notifications&email_token=AD4D7SNTMKNTIR4NXN3SSILQ3OMQTA5CNFSM4IO7KWA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH4TBXY#issuecomment-569979103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4D7SINVPPR2OPF472KYG3Q3OMQTANCNFSM4IO7KWAQ .

banchuangliangyue commented 4 years ago

@banchuangliangyue is correct, both the training and testing sequences contain one label type only. image

So I think this means knowing the label of the test heartbeat in advance, which is unrealistic.