Pre-processing Pipeline

Hi @chenqi-li ! Thank you for your interest in our work! HuBERT-ECG accepts 5-second 12-leads ECGs where the leads are concatenated with each other following this order: I, II, III, aVL, aVR, aVF, V1, ... V6. The resulting input tensor has shape (batch_size, 5 x 100 x 12). If the ECGs you have last less than 5 seconds, you can pad them and adjust the attention_mask accordingly. A simple implementation for its computation is provided in dataset.py.

Extra tip: if your ECGs last more than 5 seconds, we suggest you to use a time-aligned random-crop as data-augmentation (see dataset.py). This simple augmentation helped us reach much better downstream performance

PTB is one of the datasets onto which we evaluated HuBERT-ECG, so you might want to take a look at the preprint ;)

Should you have any suggestion, please let us know.

Happy coding!

Edoar-do / HuBERT-ECG

Pre-processing Pipeline #1