Edoar-do / HuBERT-ECG

Other
6 stars 0 forks source link

Pre-processing Pipeline #1

Open chenqi-li opened 22 hours ago

chenqi-li commented 22 hours ago

Hi,

Thank you for the interesting work and open-sourcing the code & pre-trained weights.

I would like to try your pre-trained model on a different dataset. What format/structure should I pre-process my data? Alternatively, if possible, could you please provide the pre-processing for PTB dataset?

Thanks.

Edoar-do commented 19 hours ago

Hi @chenqi-li ! Thank you for your interest in our work! HuBERT-ECG accepts 5-second 12-leads ECGs where the leads are concatenated with each other following this order: I, II, III, aVL, aVR, aVF, V1, ... V6. The resulting input tensor has shape (batch_size, 5 x 100 x 12). If the ECGs you have last less than 5 seconds, you can pad them and adjust the attention_mask accordingly. A simple implementation for its computation is provided in dataset.py.

Extra tip: if your ECGs last more than 5 seconds, we suggest you to use a time-aligned random-crop as data-augmentation (see dataset.py). This simple augmentation helped us reach much better downstream performance

PTB is one of the datasets onto which we evaluated HuBERT-ECG, so you might want to take a look at the preprint ;)

Should you have any suggestion, please let us know.

Happy coding!