awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.58k stars 750 forks source link

Exception: Reached maximum number of idle transformation calls. #205

Closed X-I-O-N closed 5 years ago

X-I-O-N commented 5 years ago

Description

I tried to run the quick start script on data that is very similar to the tweet data. However I get the error posted below. Not sure what is wrong. It runs fine using the default tweet data but not using mine and I tried to mimic it as close as possible. An example of my data is this in csv

timestamp | value
2018-07-18 21:05:00 | 1.1649
2018-07-18 21:10:00 | 1.16472
2018-07-18 21:15:00 | 1.16481
2018-07-18 21:20:00 | 1.16462
2018-07-18 21:25:00 | 1.16461
2018-07-18 21:30:00 | 1.16456
2018-07-18 21:35:00 | 1.1647
2018-07-18 21:40:00 | 1.16452
2018-07-18 21:45:00 | 1.16487

To Reproduce

from gluonts.dataset import common
from gluonts.model import deepar

import pandas as pd

df = pd.read_csv('C:\\Users\\Teert\\AppData\\Roaming\\MetaQuotes\\Terminal\\6BCF14A3917E5BC71FD48812B8ED0586\\MQL4\\Files\\EURUSD,5.csv', header=0, index_col=0)

data = common.ListDataset([{"start": df.index[0],
                            "target": df.value[:"2018-07-18 21:05:00"]}],
                          freq="5min")

estimator = deepar.DeepAREstimator(freq="5min", prediction_length=12)
predictor = estimator.train(training_data=data)

prediction = next(predictor.predict(data))
print(prediction.mean)
prediction.plot(output_file='graph.png')

Error Message

INFO:root:Start model training
INFO:root:Number of parameters in DeepARTrainingNetwork: 13463
INFO:root:Epoch[0] Learning rate is 0.001
  0%|                                                                                           | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "F:\Downloads\RLTrader-0.0.1\gluontstest.py", line 30, in <module>
    predictor = estimator.train(training_data=data)
  File "C:\Users\Teert\Anaconda3\lib\site-packages\gluonts\model\estimator.py", line 200, in train
    return self.train_model(training_data).predictor
  File "C:\Users\Teert\Anaconda3\lib\site-packages\gluonts\model\estimator.py", line 186, in train_model
    train_iter=training_data_loader,
  File "C:\Users\Teert\Anaconda3\lib\site-packages\gluonts\trainer\_base.py", line 251, in __call__
    for batch_no, data_entry in enumerate(it, start=1):
  File "C:\Users\Teert\Anaconda3\lib\site-packages\tqdm\_tqdm.py", line 1022, in __iter__
    for obj in iterable:
  File "C:\Users\Teert\Anaconda3\lib\site-packages\gluonts\dataset\loader.py", line 188, in __iter__
    data_entry = next(self._cur_iter)
  File "C:\Users\Teert\Anaconda3\lib\site-packages\gluonts\transform.py", line 362, in __call__
    f'Reached maximum number of idle transformation calls.\n'
Exception: Reached maximum number of idle transformation calls.
This means the transformation looped over GLUONTS_MAX_IDLE_TRANSFORMS=100 inputs without returning any output.
This occurred in the following transformation:
gluonts.transform.InstanceSplitter(forecast_start_field="forecast_start", future_length=12, is_pad_field="is_pad", output_NTC=True, past_length=1165, pick_incomplete=True, start_field="start", target_field="target", time_series_fields=["time_feat", "observed_values"], train_sampler=gluonts.transform.ExpectedNumInstanceSampler(num_instances=1.0))

Environment

lostella commented 5 years ago

How long is the “target” field of the time series that you’re training on?

lostella commented 5 years ago

@X-I-O-N use triple ticks ``` to open and close code blocks

X-I-O-N commented 5 years ago

@lostella

My csv has 74255 rows

However, I tried to test with much less rows and I still got the same error. Any idea as to what could be the problem? Im stumped

Thanks

lostella commented 5 years ago

This problem usually occurs when many time series in your training dataset are shorter than prediction_length. In your case there is only one time series, if I'm not wrong, so I'm wondering: are you slicing it to have length < 12 for training? If so, that's the problem, and you should try providing a longer portion of your time series for training (at least a few hundreds data points, but since you have a very long time series you can provide several thousands)

X-I-O-N commented 5 years ago

but I have 74255 rows that I am feeding it? How can I give it more? Yes only one time series as depicted in my csv file snipped

lostella commented 5 years ago

@X-I-O-N two things I noticed: you should indicate that | is the separator in your case, so load the csv as

df = pd.read_csv('~/file.csv', header=0, index_col=0, sep="|")

Furthermore, the "target" field should be a one-dimensional array, and you can extract it from the dataframe with

df.values[:, 0]

which returns

array([1.1649 , 1.16472, 1.16481, 1.16462, 1.16461, 1.16456, 1.1647, 1.16452, 1.16487])

which is what you need. If you need to slice the dataframe up to a given date you can do e.g.

df[:"2018-07-18 21:25:00"].values[:, 0]
X-I-O-N commented 5 years ago

using

df[:"2018-07-18 21:25:00"].values[:, 0]

I get the same error as before.

using

df = pd.read_csv('~/file.csv', header=0, index_col=0, sep="|")

I get

Traceback (most recent call last): File "F:\Downloads\RLTrader-0.0.1\gluontstest.py", line 13, in "target": df.value[:"2018-07-18 21:05:00"]}], File "C:\Users\Teert\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5067, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'value'

C:\Users\Teert>python F:\Downloads\RLTrader-0.0.1\gluontstest.py Traceback (most recent call last): File "F:\Downloads\RLTrader-0.0.1\gluontstest.py", line 13, in "target": df[:"2018-07-18 21:05:00"].values[:,0]}], IndexError: index 0 is out of bounds for axis 1 with size 0

by the way, I am not sure why GitHub used | as a separator, my csv uses commas so I dont think that would be the issue

lostella commented 5 years ago

using

df[:"2018-07-18 21:25:00"].values[:, 0]

I get the same error as before.

That's probably because there the time series is being sliced at a very early point in time; try with the full data

df.values[:, 0]

or with a much larger slice anyway

df[:"2018-07-19 21:25:00"].values[:, 0]  # 24 hours more of data

using

df = pd.read_csv('~/file.csv', header=0, index_col=0, sep="|")

I get

Traceback (most recent call last): File "F:\Downloads\RLTrader-0.0.1\gluontstest.py", line 13, in "target": df.value[:"2018-07-18 21:05:00"]}], File "C:\Users\Teert\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5067, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'value'

C:\Users\Teert>python F:\Downloads\RLTrader-0.0.1\gluontstest.py Traceback (most recent call last): File "F:\Downloads\RLTrader-0.0.1\gluontstest.py", line 13, in "target": df[:"2018-07-18 21:05:00"].values[:,0]}], IndexError: index 0 is out of bounds for axis 1 with size 0

by the way, I am not sure why GitHub used | as a separator, my csv uses commas so I dont think that would be the issue

Yes, if you use a comma then forget about the |.

Just make sure your dataframe is loaded as expected, and if you have a "value" column is there with your data then you can do

target=df.value.tolist()
X-I-O-N commented 5 years ago

This worked

quick question, how do I reduce the number of epochs? Say 100 to 1 for testing purposes?

lostella commented 5 years ago

You must specify a Trainer when constructing the DeepAREstimator. E.g.:

from gluonts.model.deepar import DeepAREstimator
from gluonts.trainer import Trainer

estimator = DeepAREstimator(
    prediction_length=12,
    freq="5min",
    trainer=Trainer(epochs=1),
)