Closed devhci closed 1 year ago
OK. CACHET-CADB is a new and relatively large database. I am happy to add such new databases to this library.
There are 4 classes for the (10s single-lead) ECG signals. Is it correct?
Currently, I have checked the file cachet-cadb_short_format_without_context.hdf5
, and found that the labels and signals have the same shape (16404480,)
. In this function, only a very small part (1/(10*1024)) of the labels is used. A possible suggestion is that the size of the stored labels could be reduced if only one label is stored for each recording.
"There are 4 classes for the (10s single-lead) ECG signals. Is it correct?"
Yes, currently there are 4 classes. This database is kind of work in progress as the project is still on. More patients and annotations will be uploaded periodically.
"Currently, I have checked the file cachet-cadb_short_format_without_context.hdf5, and found that the labels and signals have the same shape (16404480,)
" Yes, The annotations and ECG are of same size. This is combined file of 10s ECG annotations from other records. I agree that size could be reduced by storing the labels effectively. 16404480/(1024Hz*10 seconds)= 1602 samples of 10 seconds belonging to 4 classes.
BTW cachet-cadb_short_format_without_context.hdf5 is (this) which is a just a small annotated part of just 1602 samples. The raw database is of size 15 GB. , It would good resource for training sem or unsupervised learning.
In the Function read_annotations_and_load_correspondingECG(annotation_path, ecg_data_path, output_file_name) single day raw ECG is available as below
signal = u['ecg.bin'] # Read the ECG signal from bin file data = signal.get_data() data = data[0] # Final numpy array containing full days
Similarly corresponding raw acceleration data can be accessed using
acc= u['acc.bin'] # Read the Acc signal from bin file
I think there could be two versions--
One for just loading the (fully annotated) which is available in cachet-cadb_short_format_without_context.hdf5
Ability to load the raw ECG and accelerometer data for each day-- this will be handy for sem/Unsupervised learning
Please have look at the CACHET-CADB paper for quick overview. Also if you like, I can give a quick walk though of code and database structure over a call for speedy implementation.
Dear Prof. @wenh06,
Firstly, thank you for your efforts to make ECG processing and DL handy. I would like to inquire if its possible to integration the CACHET-CADB with the torch_ECG. The CACHET-CADB contains 5000+ hrs long continuous ECG datasets under free living conditions. Code for loading the data , context, and annotations are already provided in this notebook. I would be happy to assist in the integration of CACHET-CADB into torch_ecg .
Regards Devender
It is now included in the cachet-cadb
branch.
There are a few problems:
resolution
in the paper is 12 bit, but the adcResolution
field in the xml
files is typically 16 bit (at least for ecg
). DAC using adcResolution
does not produce reasonable voltage values for ECGs, please check it.channel
field in the xml
files used for? Since all raw data read from corresponding files are 1-dimensional. How should one transform the 1d raw data into multi-dimensional?marker.csv
files record? Some of the recordings do not have such a file.Most of the problems are tagged # TODO
in the code.
I find that the unisens data are converted from digital to analogue using the field lsbValue
in the header files. But the inconsistency of the field adcResolution
with values in the paper should be checked.
Sorry for the late response!
value = (ADCout - baseline) * lsbValue
Not sure how you are reading it but please have a look at Unisens python library for easy read. It already gives scaled down values in mV.
Also, just wanted to share that the 'bin' files (e.g HR_live.bin and hrvrmssd_live.bin ) with name live
are not the main files . They contain the hr, HRV value calculated by the (hr, hrv )algorithm in device for live mode (transmitted over BLE) which are often not correct in case of non NSR rhythms. Anyhow they can be easily calculated using the raw ECG.bin.
In torch_ecg you should focus on providing the raw ECG and Accelerometer data.
Channel field in the xml represent the number of channels . For ECG these is one channel where as for Acc. there are three channels i.e XYZ
marker.csv
contains the index when patients tapped on the device and reported to have some symptoms. marker.csv
is missing in some cases if patients did not tap on the device and reported any symptoms. To convert the actual time of Tap corresponding to the ECG the index in the Tap marker needs to be divided with its sampling frequency(which is 64 Hz)
Yes, I've noticed that the DAC is done using the field lsbValue
. Now almost all databases allow loading of physical (analogue) values as well as digital values, via assigning different values to the parameter units
of the load_data
method.
Now the CACHET-CADB
has been merged into the master branch.
Dear Prof. @wenh06,
Firstly, thank you for your efforts to make ECG processing and DL handy. I would like to inquire if its possible to integration the CACHET-CADB with the torch_ECG. The CACHET-CADB contains 5000+ hrs long continuous ECG datasets under free living conditions. Code for loading the data , context, and annotations are already provided in this notebook. I would be happy to assist in the integration of CACHET-CADB into torch_ecg .
Regards Devender