ankane / prophet-ruby

Time series forecasting for Ruby
MIT License
396 stars 11 forks source link

Wrong forecast results #10

Closed blackrez closed 2 years ago

blackrez commented 2 years ago

Hello,

Prophet return some strange values

3.1.0 :200 > series
 =>
{#<Date: 2022-01-03 ((2459583j,0s,0n),+0s,2299161j)>=>1.639,
 #<Date: 2022-01-05 ((2459585j,0s,0n),+0s,2299161j)>=>1.649,
 #<Date: 2022-01-06 ((2459586j,0s,0n),+0s,2299161j)>=>1.659,
 #<Date: 2022-01-07 ((2459587j,0s,0n),+0s,2299161j)>=>1.669,
 #<Date: 2022-01-08 ((2459588j,0s,0n),+0s,2299161j)>=>1.659,
 #<Date: 2022-01-10 ((2459590j,0s,0n),+0s,2299161j)>=>1.669,
 #<Date: 2022-01-11 ((2459591j,0s,0n),+0s,2299161j)>=>1.689,
 #<Date: 2022-01-12 ((2459592j,0s,0n),+0s,2299161j)>=>1.679,
 #<Date: 2022-01-13 ((2459593j,0s,0n),+0s,2299161j)>=>1.689,
 #<Date: 2022-01-14 ((2459594j,0s,0n),+0s,2299161j)>=>1.699,
 #<Date: 2022-01-15 ((2459595j,0s,0n),+0s,2299161j)>=>1.699,
 #<Date: 2022-01-18 ((2459598j,0s,0n),+0s,2299161j)>=>1.709,
 #<Date: 2022-01-20 ((2459600j,0s,0n),+0s,2299161j)>=>1.719,
 #<Date: 2022-01-21 ((2459601j,0s,0n),+0s,2299161j)>=>1.729,
 #<Date: 2022-01-22 ((2459602j,0s,0n),+0s,2299161j)>=>1.719,
 #<Date: 2022-01-25 ((2459605j,0s,0n),+0s,2299161j)>=>1.729,
 #<Date: 2022-01-27 ((2459607j,0s,0n),+0s,2299161j)>=>1.739,
 #<Date: 2022-01-29 ((2459609j,0s,0n),+0s,2299161j)>=>1.729}
3.1.0 :201 > Prophet.forecast(series)
 =>
{#<Date: 2022-01-30 ((2459610j,0s,0n),+0s,2299161j)>=>5.547226954196092,
 #<Date: 2022-01-31 ((2459611j,0s,0n),+0s,2299161j)>=>1.7157371604726062,
 #<Date: 2022-02-01 ((2459612j,0s,0n),+0s,2299161j)>=>1.7390083969256263,
 #<Date: 2022-02-02 ((2459613j,0s,0n),+0s,2299161j)>=>1.7340095632954138,
 #<Date: 2022-02-03 ((2459614j,0s,0n),+0s,2299161j)>=>1.7490078239414186,
 #<Date: 2022-02-04 ((2459615j,0s,0n),+0s,2299161j)>=>1.749007500166091,
 #<Date: 2022-02-05 ((2459616j,0s,0n),+0s,2299161j)>=>1.7390047264672899,
 #<Date: 2022-02-06 ((2459617j,0s,0n),+0s,2299161j)>=>5.557226905370591,
 #<Date: 2022-02-07 ((2459618j,0s,0n),+0s,2299161j)>=>1.725737111645919,
 #<Date: 2022-02-08 ((2459619j,0s,0n),+0s,2299161j)>=>1.7490083480986351}

Clearly the first one and the value for the 2022-02-06 are wrong. I suspect a arm64 bug but I don't have an amd64 for test the code.

ankane commented 2 years ago

Hey @blackrez, I'm seeing similar results on x86-64, so don't think it's related to ARM.

It looks like the problem is series doesn't include any Sundays, but it's trying to predict them. If you need predictions for Sundays, make sure to include them in the input. Otherwise, you can filter them from the output.

ankane commented 2 years ago

It looks like the Python library has similar behavior.

import pandas as pd
from prophet import Prophet

df = pd.DataFrame({
  'ds': ["2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-01-08", "2022-01-10", "2022-01-11", "2022-01-12", "2022-01-13", "2022-01-14", "2022-01-15", "2022-01-18", "2022-01-20", "2022-01-21", "2022-01-22", "2022-01-25", "2022-01-27", "2022-01-29"],
  'y': [1.639, 1.649, 1.659, 1.669, 1.659, 1.669, 1.689, 1.679, 1.689, 1.699, 1.699, 1.709, 1.719, 1.729, 1.719, 1.729, 1.739, 1.729]
})

m = Prophet()
m.fit(df)

future = m.make_future_dataframe(periods=10, include_history=False)
forecast = m.predict(future)
print(forecast[['ds', 'yhat']])

Output

          ds      yhat
0 2022-01-30 -3.960552
1 2022-01-31  1.720297
2 2022-02-01  1.739000
3 2022-02-02  1.731007
4 2022-02-03  1.749000
5 2022-02-04  1.749000
6 2022-02-05  1.739000
7 2022-02-06 -3.950552
8 2022-02-07  1.730297
9 2022-02-08  1.749000
blackrez commented 2 years ago

Thanks for your response and your help, my dataset have a lots of issue and it have a lots of missing days.

ankane commented 2 years ago

Another option is to disable weekly seasonality with the advanced API:

require "prophet"

df = Rover::DataFrame.new({
  "ds" => ["2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-01-08", "2022-01-10", "2022-01-11", "2022-01-12", "2022-01-13", "2022-01-14", "2022-01-15", "2022-01-18", "2022-01-20", "2022-01-21", "2022-01-22", "2022-01-25", "2022-01-27", "2022-01-29"],
  "y" => [1.639, 1.649, 1.659, 1.669, 1.659, 1.669, 1.689, 1.679, 1.689, 1.699, 1.699, 1.709, 1.719, 1.729, 1.719, 1.729, 1.739, 1.729]
})

m = Prophet.new(weekly_seasonality: false)
m.fit(df)

future = m.make_future_dataframe(periods: 10, include_history: false)
forecast = m.predict(future)
p forecast[["ds", "yhat"]]

Output

                     ds                yhat
2022-01-30 00:00:00 UTC  1.7360273138060855
2022-01-31 00:00:00 UTC  1.7374329821020085
2022-02-01 00:00:00 UTC  1.7388386503979312
2022-02-02 00:00:00 UTC   1.740244318693854
2022-02-03 00:00:00 UTC  1.7416499869897768
2022-02-04 00:00:00 UTC  1.7430556552856997
2022-02-05 00:00:00 UTC  1.7444613235816226
2022-02-06 00:00:00 UTC  1.7458669918775456
2022-02-07 00:00:00 UTC   1.747272660173468
2022-02-08 00:00:00 UTC  1.7486783284693912
blackrez commented 2 years ago

Yeah, it could be the best solution for my very inconsistant dataset. Many thanks for your help.