IndexError: too many indices for array on some datasets

shlima commented 4 years ago

Traceback (most recent call last):
  File "main.py", line 103, in <module>
    model_handler(model_config)
  File "main.py", line 90, in model_handler
    x = training_set[:, 0].reshape(-1, 1)
IndexError: too many indices for array

CSV (Germany)

eladcn commented 4 years ago

I tried using the dataset you provided by replacing the content of the file 'cases_dataset_2020-04-09.csv' and setting the 'grab_data_from_server' property to 'false' under and the cases model and I did not receive any errors.

Can you please write the steps you took that resulted this error?

shlima commented 4 years ago

Hm, the same with me. But I started to get incorrect results for some countries

CSV file for Germany:

Forecast:

The forecast for Cases in the following 30 days is:
1: 116149
2: 115391
3: 112786
4: 108023
5: 100763
6: 90631
7: 77222
8: 60091
9: 38757
10: 12698
11: -18650
12: -55897
13: -99700
14: -150765
15: -209853
16: -277776
17: -355408
18: -443681
19: -543591
20: -656200
21: -782641
22: -924118
23: -1081912
24: -1257382
25: -1451972
26: -1667208
27: -1904711
28: -2166191
29: -2453457
30: -2768420

Config file:

{
    "models": [
        {
            "model_name": "Cases",
            "polynomial_degree": 7,
            "datagrabber_class": "CasesDataGrabber",
            "grab_data_from_server": false,
            "offline_dataset_date": "0000-00-00",
            "days_to_predict": 30
        },
        {
            "model_name": "Deaths",
            "polynomial_degree": 7,
            "datagrabber_class": "DeathsDataGrabber",
            "grab_data_from_server": false,
            "offline_dataset_date": "0000-00-00",
            "days_to_predict": 30
        }
    ]
}

Chart

photo_2020-04-10 14 39 06

eladcn commented 4 years ago

I see, you are getting these incorrect results because the polynomial degree of your model is too high for your data.

In order to get better results, you need to tweak the "polynomial_degree" hyper-parameter in the config file (this is a trial and error process). For starter, try a polynomial degree of 2, 3 or 4 instead of 7. According to the data visualization you have provided, a polynomial degree of 2 or 3 should fit quite well.

shlima commented 4 years ago

@eladcn thank you, your suggestion works.

Now I have 2 cases:

Estnoia with polynomial_degree of 3:

photo_2020-04-10 15 47 21

Estnoia with polynomial_degree of 5:

photo_2020-04-10 15 47 23

It seems that the second chart for Estonia is more believable.

Can you suggest me a pattern by which I can set the polynomial_degree to the correct value for each country ?

Dataset for Estonia:

eladcn commented 4 years ago

Unfortunately there isn't really a pattern for this, but I can give you a few tips:

Visualizing the data is great because you can see how many inflections points you have - the more inflections points you have will mean that the model will need to fit a little bit more - hence a higher polynomial degree might be better.
If you see that according to the data, the increase rate is close to linear - then a polynomial degree of 1 will probably do well, if you see that the increase rate is similar to a quadratic rate - then a polynomial degree of 2 will probably do well, etc...
You don't need to fit your model to all of you data - if suddenly there's a huge change in the rate of increase, you can take for example only the last 10 days in your dataset.
The more data you have - the more stable the model will be.
If one polynomial degree (say a polynomial degree of 7) seems to suddenly give unusual values, tweaking the polynomial degree up or down by 1 or 2 values may do the trick - it really depends on the data - if the data changes dramatically, you will probably need to tweak it.

I will consider adding neural network support to the this project in the coming days - using neural networks might be better for some scenarios.

shlima commented 4 years ago

Thank you very much for your support

eladcn / coronavirus_prediction

IndexError: too many indices for array on some datasets #5