awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.57k stars 750 forks source link

How to use predictor method when there are dynamic feature #729

Open islama-lh opened 4 years ago

islama-lh commented 4 years ago

Hi Thanks for this nice framework. I am trying to use DeepAR with Dynamic Feature. This is what my train and test features looks like


def get_train_dataset(start, end, timeseries, wdf):  
    training_data = [
        {
            "start": str(start),
            "target": ts[start:end].tolist(), 
            "feat_dynamic_real": pd.concat([
                wdf[start:end],\
                ts.shift(24, fill_value=ts.mean())[start:end], 
                ts.shift(7*24, fill_value=ts.mean())[start:end], 
                ts.shift(28*24, fill_value=ts.mean())[start:end],
                ts.shift(52*7*24,fill_value=ts.mean())[start :end],
                ], axis=1).T.values,
        }
        for ts in timeseries
    ]
    print(len(training_data))

    return training_data

def get_test_dataset(start_dataset, end_training, timeseries, weather_df,\
     prediction_length = 24, num_test_windows = 1):
    num_test_windows = 1
    test_data = [
        {
            "start": str(start_dataset),
            "target": ts[start_dataset:end_training +2 * k * prediction_length].tolist(),
            "feat_dynamic_real": pd.concat([
            weather_df[start_dataset:end_training + 2* k * prediction_length],
            ts.shift(24, fill_value=ts.mean())[start_dataset :end_training+ 2 * k * prediction_length], 
            ts.shift(7*24, fill_value=ts.mean())[start_dataset:end_training+ 2 * k * prediction_length], 
            ts.shift(28*24, fill_value=ts.mean())[start_dataset :end_training+ 2 * k * prediction_length], 
            ts.shift(52*7*24, fill_value=ts.mean())[start_dataset:end_training+ 2 * k * prediction_length],
             ], axis= 1).T.values, 
        }
        for k in range(1, num_test_windows + 1) 
        for ts in timeseries
    ]
    print(len(test_data))
    return test_data

I am using DeepAR estimator and training goes fine

estimator = DeepAREstimator(
        prediction_length=24,
        context_length=2*24,
        use_feat_dynamic_real = True,
        freq='1H',
        trainer=Trainer(
             epochs=3, patience = 2, num_batches_per_epoch=100, ctx=mx.gpu(0)
        ),
    )
predictor = estimator.train(train_ds)

Now when I am trying to perform evaluation It's working fine as I can see the code for evaluation the dynamic features truncated

forecast_it, ts_it = make_evaluation_predictions(
    dataset=test_ds,
    predictor=predictor,  # predictor
    num_samples=100,  # number of sample paths we want for evaluation
)
forecast_list = list(forecast_it)

But when I am trying to use predict method it's throwing an error

forecast_list= list(predictor.predict(test_ds))

This error looks like

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 12404 and the array at index 2 has size 12380

I am not sure how to deal with this? Also I tested forecast_list= list(predictor.predict(test_ds)) without dynamic feature and it's working as expected. Is it expected? or Am I doing something unusual?

Thanks

islama-lh commented 4 years ago

I figured it out. It looks like the target has to be different length from the dynamic features which is intuitive. As I am trying to predict target for given feature. I am closing this issue and commenting it out

def get_test_dataset(start_dataset, end_training, timeseries, weather_df,\
     prediction_length = 24, num_test_windows = 1):
    num_test_windows = 1
    test_data = [
        {
            "start": str(start_dataset),
            "target": ts[start_dataset:end_training + k * prediction_length].tolist(),
            "feat_dynamic_real": pd.concat([
            weather_df[start_dataset:end_training + 2* k * prediction_length],
            ts.shift(24, fill_value=ts.mean())[start_dataset :end_training+ 2 * k * prediction_length], 
            ts.shift(7*24, fill_value=ts.mean())[start_dataset:end_training+ 2 * k * prediction_length], 
            ts.shift(28*24, fill_value=ts.mean())[start_dataset :end_training+ 2 * k * prediction_length], 
            ts.shift(52*7*24, fill_value=ts.mean())[start_dataset:end_training+ 2 * k * prediction_length],
             ], axis= 1).T.values, 
        }
        for k in range(1, num_test_windows + 1) 
        for ts in timeseries
    ]
    print(len(test_data))
    return test_data
islama-lh commented 4 years ago

I would like to use GuonTS in production environment without Sagemaker. My dataset and Training component looks like above. I am using last 3 years data which is in hourly grain. I am trying to predict for next 1 month in hourly grain(30*24 values). I am predicting 24 hours (Prediction Length) each time. Say On January 31 I am trying to Predict for whole February. For this at first I am predicting for February 1st then append the predicted value with the Target and predicting for January 2nd so on and so forth. The problem is it's taking very long to do prediction for 28 days (for February) and as you can see I am using dynamic features which making the timeseries I am sending for prediction is very long. Also the output of the model is not that great After couple of days I can see the errors are adding up and performance degrading. If I do use dynamic features for Training do I need to send them along with the Target for prediction? Also Is there a way to minimize prediction time. If there is any practical example will you please share them.