awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.54k stars 746 forks source link

GPVAREstimator - AssertionError: #3066

Open ArianKhorasani opened 9 months ago

ArianKhorasani commented 9 months ago

Dear @lostella or maybe @jaheba et al - I'd require your help!

I'm training my multivariate time series dataset which is converted to ListDataset on GPVAREstimator, but getting the following AssertionError in Training process:

AssertionError Traceback (most recent call last) Cell In[38], line 1 ----> 1 predictor = estimator.train( 2 training_data = train_ds_residuals, 3 shuffle_buffer_length = 100, 4 cache_data = True, 5 )

File ~/Project/pytorch-transformer-ts/myenv/lib/python3.8/site-packages/gluonts/mx/model/estimator.py:237, in GluonEstimator.train(self, training_data, validation_data, shuffle_buffer_length, cache_data, kwargs) 229 def train( 230 self, 231 training_data: Dataset, (...) 235 kwargs, 236 ) -> Predictor: --> 237 return self.train_model( 238 training_data=training_data, 239 validation_data=validation_data, 240 shuffle_buffer_length=shuffle_buffer_length, 241 cache_data=cache_data, 242 ).predictor

File ~/Project/pytorch-transformer-ts/myenv/lib/python3.8/site-packages/gluonts/mx/model/estimator.py:205, in GluonEstimator.train_model(self, training_data, validation_data, from_predictor, shuffle_buffer_length, cache_data) 197 transformed_validation_data = Cached( ... 35 input_dim=self.target_dim, 36 output_dim=4 * self.distr_output.rank, 37 )

AssertionError:

Please note that the target_dim = 7, prediction_length = 1, and context_length = 5. Here is the whole code of GPVAREstimator that I'm using too:

estimator = GPVAREstimator( prediction_length = 1, target_dim = 7, freq = '1H', context_length = 5, num_layers = 4, num_cells = 32, distr_output = MultivariateGaussianOutput(dim=7), trainer = Trainer(ctx = "cpu", epochs = 50, weight_decay = 1e-8, num_batches_per_epoch = 100) ) predictor = estimator.train( training_data = train_ds_residuals, shuffle_buffer_length = 100, cache_data = True, )

I'd appreciated if you could help me with this! Thank you!

lostella commented 9 months ago

@ArianKhorasani could you provide the entire error trace? It’s not clear which assertion is failing

ArianKhorasani commented 9 months ago

@lostella - the error trace that I provided is the entire error that I get. Please check the screenshot below too:

Screen Shot 2023-11-28 at 4 02 20 PM
ArianKhorasani commented 9 months ago

Dear @lostella - I have already checked the dimension of my multivariate time series too. Putting my whole dataset code below:

variables = ['DBP', 'SBP', 'Resp', 'Temp', 'HR', 'O2Sat', 'MAP'] df_actual = pd.read_csv('merged_test.csv') static_features = df_actual[['patient_id', 'Age', 'Gender', 'HospAdmTime']].drop_duplicates().reset_index(drop=True) df_residuals = pd.DataFrame()

for variable in variables:

First, let's load forecasted values

df_forecast = pd.read_csv(f'forecasts_{variable}.csv')

# Ensure that the data are ordered in the same way
df_actual = df_actual.sort_values(by=['patient_id', 'ICULOS'])
df_forecast = df_forecast.sort_values(by=['patient_id', 'ICULOS'])

# Calculate residuals 
residuals = df_actual[variable] - df_forecast[f'{variable}_forecast']

# Add residual to df_residuals
df_residuals[variable] = residuals
df_residuals['patient_id'] = df_actual['patient_id']
df_residuals['ICULOS'] = df_actual['ICULOS']

Convert df_residuals to ListDataset

data_residuals = [] for patient_id, group in df_residuals.groupby('patient_id'): target = group[variables].values # Use the residuals as target start = pd.Timestamp("1970-01-01 00:00") + pd.Timedelta(hours=group['ICULOS'].iloc[0]) entry = { FieldName.TARGET: target, FieldName.START: start, # Use the index as the start date FieldName.FEAT_STATIC_CAT: static_features[static_features['patient_id'] == patient_id][['Age', 'Gender']].values[0] } data_residuals.append(entry)

dataset_residuals = ListDataset(data_residuals, freq='1H', one_dim_target=False)