AI4HealthUOL / SSSD

Repository for the paper: 'Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models'
MIT License
273 stars 52 forks source link

How to train ptb-xl dataset? #13

Closed HarperHao closed 1 year ago

HarperHao commented 1 year ago

Thanks for your great work! I want to train ptb-xl dataset, when I run train.py, a bug was encountered. image

the reason for the bug is the size of train_ptbxl_1000.npy is 17441, It's not divisible by 160. How can I modify the code? I‘m looking forward to your response, Thanks a lot!

juanlopezcode commented 1 year ago

Hello, it is because the data we provided for the sample run is in the shape of L length and K channels I believe, and PTB-XL is on the shape of B,K,L? So you wouldn't give an extra split as you already have batches.

HarperHao commented 1 year ago

Thank you very much for your reply! I debug the train.py, I found that the shape of the loaded data is not B, K, L.And it is 17441X12X100.So, according to the paper, it still needs to be splited. image

Let me elaborate on the process of running the code.

  1. I run get_data.py to get train_ptbxl_1000.npy、test_ptbxl_1000.npy and val_ptbxl_1000.npy.
  2. I modified the config_SSSDS4.json file, changing the in_channels and out_channels to 12.
  3. I run train.py and I encountered the bug mentioned above.

Looking forward to your reply again.

juanlopezcode commented 1 year ago

The splitting line is not requiered for PTB-XL. 17441 is number of samples, 12 the channels, and 1000 the length. depending on your gpu you might be able to pass small or large batches into the model. example 4,12,1000.

For PTB-XL I would recommend you use pytorch dataloaders to split the data into desired batches. Hope this helps

HarperHao commented 1 year ago

Thanks for your suggestion! I modified the code as you suggested. The code runs successfully! It is being trained. I must thank you again!