IrinaStatsLab / GlucoBench

The official implementation of the paper "GlucoBench: Curated List of Continuous Glucose Monitoring Datasets with Prediction Benchmarks."
5 stars 2 forks source link

Error during evaluation #3

Closed alceubissoto closed 1 month ago

alceubissoto commented 1 month ago

Hi! First of all, congratulations on the paper and the very well-organized code. It is an excellent resource for people like me starting to work with glucose monitoring data. I am trying to reproduce your paper, specifically the code related to transformers.

I installed the package as recommended, creating a conda environment and installing the requirements.

When I run python lib/transformer.py --optuna False, I encounter the following error:

Traceback (most recent call last):
  File "/Users/user/Work/GlucoBench/lib/transformer.py", line 250, in <module>
    id_cal_errors_sample = rescale_and_test(trues,
  File "/Users/user/Work/GlucoBench/lib/../utils/darts_evaluation.py", line 264, in rescale_and_test
    cal_error += (est_p - p) ** 2
ValueError: operands could not be broadcast together with shapes (12,) (11,) (12,)

I believe this issue arrises from the way cal_error is defined: cal_error = np.zeros(forecasts[0].n_timesteps), which depends on the desired output length, causing the mismatch ValueError.

Should it instead be: cal_error = np.zeros_like(cdf_vals[0]) ?

Thank you!

mrsergazinov commented 1 month ago

Hi @alceubissoto! We are glad you have found our repository useful.

The shape should be (12,), because we are predicting 1 hour ahead in all tests, which 12 time points at 5 minute frequency. So changing to np.zeros_like(cdf_vals[0]) would discard one forecasting point.

Now, it seems that the est_p comes out shorter of length 11. Most likely, it is the intersect function in the _get_values_or_raise, which does not intersect the true and predicted values correctly in time. Another possibility is that the extracted true values are for some reason shifted in time or shorter.

I would suggest that you print true and predicted values together with the time points and examine them. The time points should coincide, and both should be of the same length (12).

I could also take a look. What’s the dataset you’re looking at?

alceubissoto commented 1 month ago

Hi @mrsergazinov ! Thank you for your reply. I am examining "weinstock", with the default parameters. I appreciate your suggestions on potential directions to explore and will continue investigating this issue. If you find any solutions, please let me know :)

mrsergazinov commented 1 month ago

Fix: https://github.com/IrinaStatsLab/GlucoBench/commit/91dd4c7bef698b41d3ba51e61a50c5321ea24ee7

Patch: Following up on the issue, it seems the forecasts are misaligned in time. Currently, I patched it by manually overriding time index for forecasts.

Root cause: I am attaching a screenshot, debugging the model: the last input time point is the first predicted time point. It's not clear if the model is trying to predict it, or it is simply time index issue. If it's the former, then the error metrics should not be affected and the current patch works. If it's the latter, then it's more serious and we need another follow-up.

Overall, this bug could be originating from 1) Darts or 2) our custom DataSet classes. Regarding 1), it could possibly be a bug with how Darts constructs inputs or time index of predictions. Regarding 2), we need to investigate what Darts predict_from_dataset method is expecting from a dataset (getitem) -> this could possibly have changed since the time of our testing.

Screenshot 2024-07-20 at 1 21 43 PM
Livia-Zaharia commented 1 month ago

@mrsergazinov Your commit leads to crashes on linear models, in particular:

  File "C:\Users\liv\Desktop\IT_school\benchmarking\GlucoBench\lib\linreg.py", line 141, in <module>
    id_cal_errors_sample = rescale_and_backtest(series['test']['target'],
  File "C:\Users\liv\Desktop\IT_school\benchmarking\GlucoBench\lib\..\utils\darts_evaluation.py", line 141, in rescale_and_backtest
    forecasts[idx] = series[idx].with_values(forecasts[idx].values()[:, np.newaxis])
AttributeError: 'list' object has no attribute 'values'
antonkulaga commented 1 month ago

I have the same error as @Livia-Zaharia I am blocked because it arises pretty much everywhere with linear regression.

mrsergazinov commented 1 month ago

Hi all! I just pushed an update (5c7e5ef04bac5ec0af5b82d4d1a20328ea27a761) that should be closing this issue.

As @antonkulaga and @Livia-Zaharia have noticed previous update was crashing. I rolled it back and investigated the source of the issue that was causing the bug with the Transformer in the first place.

The bug was caused by the darts and our custom SamplingInferenceDataset classes. To build a timestamp for the forecasts, darts models rely on the output from the __getitem__ method of the SamplingInferenceDataset. Our datasets were providing the last time point within the input interval. However, it seems darts has switched and requires this timestamp to be the first timestamp within the prediction interval. This has later causes the forecasts to be misaligned in time. Once this was fixed, all the model are unblocked now.