AI4HealthUOL / SSSD-ECG

Repository for the paper: 'Diffusion-based Conditional ECG Generation with Structured State Space Models'
MIT License
42 stars 6 forks source link

Clarification on Dataset Published #10

Closed SVithurabiman closed 8 months ago

SVithurabiman commented 8 months ago

I came across your work on SSSD-ECG and I found it very useful. However, I have some clarifications regarding the datasets you published in https://figshare.com/s/43df16e4a50e4dd0a0c5 . The train, test and validation dataset account for 21837 ECG signals , but PTB-XL has only 21799 signals. This has a mismatch of 38 signals. Assuming that the data published in https://mega.nz/folder/UfUDFYjS#YYUJ3CCUGb6ZNmJdCZLseg is the preprocessed PTB-XL data, this dataset also has the same issue. In this case how do I get the corresponding patient-ID for each ECG signal in the synthetic dataset? Your help would me much appreciated.

Thanks :)

Jeries27 commented 8 months ago

Add to that, the dataset in https://figshare.com/s/43df16e4a50e4dd0a0c5 is described as Synthetic-PTB-XL, which has 71 scp_statements. On the other hand, downloading from physionet directly, and running your ecg_data_preprocessing.ipynb leaves the dataset with 44 statements.

Can you clarify how was testing and training on different datasets (synthetic vs real) done when the model default config was "label_embed_classes":71 ?

This was raised earlier:

I ran SSSD-ECG-main/src/ptb_xl/ecg_data_preprocessing.ipynb to get the training data and the training label, but I encountered the same problem as XLIU430. I did not change the configuration file, and the label I got was also 44 instead of 71. What parameters need to be changed during loss data preprocessing?

Originally posted by @dtt355 in https://github.com/AI4HealthUOL/SSSD-ECG/issues/5#issuecomment-1714975523

nstrodt commented 8 months ago

The PTB-XL dataset was updated recently to remove a number of duplicate records, which makes up for the 38 signals that seem missing. Our model/paper is based on v1.0.2 of the PTB-XL dataset.

juanlopezcode commented 8 months ago

The signals and labels as numpy with 71 statements can be downloaded directly from here https://mega.nz/folder/UfUDFYjS#YYUJ3CCUGb6ZNmJdCZLseg. Similarly, the variable ptb_xl_label in the ecg_data_preprocessing.ipynb file can be replaced for 'label_all' to achieve the 71 statements. See line 528 in the ecg_utils.py file for diverse ptb-xl subsets.