AI4HealthUOL / SSSD

Repository for the paper: 'Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models'
MIT License
270 stars 47 forks source link

Electricity data #19

Closed HaoLyou closed 11 months ago

HaoLyou commented 11 months ago

Hello, thanks for your work!

Can you share the train.py that runs the electricity dataset? As your paper says, you split the 370 channels into 10 batches, How to modify train.py?

I tried modifying it ,but it will take three times as long to run electricity dataset as take to run the PTB-XL dataset ,I'm not sure what I modified was correct. Here's what I modified, Can you confirm that it is correct?

image

juanlopezcode commented 11 months ago

Hello

have a look at this file: https://github.com/AI4HealthUOL/SSSD/blob/main/docs/instructions/Solar/updated_solar.ipynb

it contains clear instructions of how to split and pre-process a dataset for the channel splitting approach, for both training and inferencing (which might include scaling)

I hope it helps

HaoLyou commented 11 months ago

Thank you for your reply! I have seen updated_solar.ipynb,I didn't understand something. Can you tell me what "path_to_real1, 2" and "path_to_imputation1, 2" are? And Is this code used in inference.py?

image

juanlopezcode commented 11 months ago

hello

in case you inference(generate) by batches and save each batch you might need to concatenate the data before use, that is what the paths supposed to load (the generated data). Precisely, everything under these lines of code is for inference, see mse below

I hope it helps

HaoLyou commented 11 months ago

Thank you!

HaoLyou commented 11 months ago

I want to confirm whether we need to concatenate the splitted channels back when calculating MSE on the inference process?

juanlopezcode commented 11 months ago

would be ideal to have a twin batch (Generated) over the original (Real) batch to compare, altough you could compute MSE over diverse batches...