problems with Temporal Classification example

tg2k commented 1 year ago

What happened + What you expected to happen

I ran into trouble attempting to use temporal classification. The example at https://nixtla.github.io/neuralforecast/examples/temporal_classifiers.html struck me as odd, in a few regards:

Rather than using real temporal data, it artificially constructs a sequence from a pixels data set.
It prepares a pixels column which is neither converted into long format (https://nixtla.github.io/neuralforecast/examples/data_format.html#long-format) nor set up as an exogenous variable (https://nixtla.github.io/neuralforecast/examples/exogenous_variables.html).
Removing the pixels column got me the same result, implying that it is not used at all. I think consumers of temporal classification would generally want their primary data to be used, as opposed to the resulting classification values.

Although the temporal classfiers code runs, I went looking for more real-world examples. In https://github.com/Nixtla/neuralforecast/issues/385 there was discussion about an example added to models.nhits.ipynb (in https://github.com/Nixtla/neuralforecast/commit/4821277708ea4584d61ae5c99e938efc34dc0bf5). This code has since been overwritten (in https://github.com/Nixtla/neuralforecast/commit/37b4b287373ff6009c3453198cf6f3ae862b0a9d) but I decided to extract it and give it a try.

In this case, the line

AirPassengersPanel['y'] = 1 * (AirPassengersPanel['trend'] % 12) < 2

leads to a validation.py validate_format() error: The target column ('y') should have a numeric data type, got 'bool')

Changing it to

AirPassengersPanel['y'] = np.where(1 * (AirPassengersPanel['trend'] % 12) < 2, 1, 0)

however, leads me instead to a failure during fit(), specifically a _base_windows.py _inv_normalization() error on y_hat.ndim == 2: 'tuple' object has no attribute 'ndim'. Perhaps this is also not a good example, as it has columns y, trend, y_[lag12] and defines stat_exog_list=['airline1'] (there is no such column, though unique_id in the data set contains both "Airline1" and "Airline2").

Is there a clean example demonstrating temporal classification, where it is possible to verify that the results are actually degraded by removing the input data and only having the classifier data present? It would be very helpful to see the data format and whether exogenous variables are used/required.

Versions / Dependencies

neuralforecast 1.64 Python 3.11.6 WSL Ubuntu

Reproduction script

In the first script, try setting remove_pixels as True / False to see the results look the same.

# From https://nixtla.github.io/neuralforecast/examples/temporal_classifiers.html

import numpy as np
import pandas as pd
from sklearn import datasets

import matplotlib.pyplot as plt
from neuralforecast import NeuralForecast
from neuralforecast.models import MLP, NHITS, LSTM
from neuralforecast.losses.pytorch import DistributionLoss, Accuracy

# options
remove_pixels = True

digits = datasets.load_digits()
images = digits.images[:100]

plt.imshow(images[0,:,:], cmap=plt.cm.gray, 
           vmax=16, interpolation="nearest")

pixels = np.reshape(images, (len(images), 64))
ytarget = (pixels > 10) * 1

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(pixels[10])
ax2.plot(ytarget[10], color='purple')
ax1.set_xlabel('Pixel index')
ax1.set_ylabel('Pixel value')
ax2.set_ylabel('Pixel threshold', color='purple')
plt.grid()
plt.show(block=True)

# We flat the images and create an input dataframe
# with 'unique_id' series identifier and 'ds' time stamp identifier.
Y_df = pd.DataFrame.from_dict({
            'unique_id': np.repeat(np.arange(100), 64),
            'ds': np.tile(np.arange(64)+1910, 100),
            'y': ytarget.flatten(), 'pixels': pixels.flatten()})
# Y_df
if remove_pixels:  # demonstrate that removing pixels results in the same accuracy as including them
    Y_df = Y_df.drop(columns=['pixels'])

print(Y_df.head())

horizon = 12

# Try different hyperparmeters to improve accuracy.
models = [MLP(h=horizon,                           # Forecast horizon
              input_size=2 * horizon,              # Length of input sequence
              loss=DistributionLoss('Bernoulli'),  # Binary classification loss
              valid_loss=Accuracy(),               # Accuracy validation signal
              max_steps=500,                       # Number of steps to train
              scaler_type='standard',              # Type of scaler to normalize data
              hidden_size=64,                      # Defines the size of the hidden state of the LSTM
              #early_stop_patience_steps=2,         # Early stopping regularization patience
              val_check_steps=10,                  # Frequency of validation signal (affects early stopping)
              ),
          NHITS(h=horizon,                          # Forecast horizon
                input_size=2 * horizon,             # Length of input sequence
                loss=DistributionLoss('Bernoulli'), # Binary classification loss
                valid_loss=Accuracy(),              # Accuracy validation signal                
                max_steps=500,                      # Number of steps to train
                n_freq_downsample=[2, 1, 1],        # Downsampling factors for each stack output
                #early_stop_patience_steps=2,        # Early stopping regularization patience
                val_check_steps=10,                 # Frequency of validation signal (affects early stopping)
                )             
          ]
nf = NeuralForecast(models=models, freq='Y')
Y_hat_df = nf.cross_validation(df=Y_df, n_windows=1)

# By default NeuralForecast produces forecast intervals
# In this case the lo-x and high-x levels represent the 
# low and high bounds of the prediction accumulating x% probability
Y_hat_df = Y_hat_df.reset_index(drop=True)
print(Y_hat_df.head())

# Define classification threshold for final predictions
# If (prob > threshold) -> 1
Y_hat_df['NHITS'] = (Y_hat_df['NHITS'] > 0.5) * 1
Y_hat_df['MLP'] = (Y_hat_df['MLP'] > 0.5) * 1
print(Y_hat_df.head())

plot_df = Y_hat_df[Y_hat_df.unique_id==10]

fig, ax = plt.subplots(1, 1, figsize = (20, 7))
plt.plot(plot_df.ds, plot_df.y, label='target signal')
plt.plot(plot_df.ds, plot_df['MLP'] * 1.1, label='MLP prediction')
plt.plot(plot_df.ds, plot_df['NHITS'] * .9, label='NHITS prediction')
ax.set_title('Binary Sequence Forecast', fontsize=22)
ax.set_ylabel('Pixel Threshold and Prediction', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()
plt.show(block=True)

def accuracy(y, y_hat):
    return np.mean(y==y_hat)

mlp_acc = accuracy(y=Y_hat_df['y'], y_hat=Y_hat_df['MLP'])
nhits_acc = accuracy(y=Y_hat_df['y'], y_hat=Y_hat_df['NHITS'])

print(f'MLP Accuracy: {mlp_acc:.1%}')
print(f'NHITS Accuracy: {nhits_acc:.1%}')

In the second script, it blows up on the y_hat.ndim == 2 check.

# from models.nhits.py 02-02-2023 [FEAT] Zero Inflated and Categorical Distributions (NBinomial, Tweedie, Bernoulli) (#427)
# (this was later wiped out by another change: 05-18-2023 Added HuberLoss + NHITS HuberLoss test (#577))
# see related https://github.com/Nixtla/neuralforecast/issues/385

import numpy as np
import pandas as pd
import pytorch_lightning as pl
import matplotlib.pyplot as plt

from neuralforecast import NeuralForecast
from neuralforecast.models import MLP
from neuralforecast.losses.pytorch import DistributionLoss, Accuracy
from neuralforecast.tsdataset import TimeSeriesDataset
from neuralforecast.utils import AirPassengers, AirPassengersPanel, AirPassengersStatic

print(AirPassengersPanel.head())
print(AirPassengersPanel.describe())
print(AirPassengersPanel['unique_id'].unique())

#AirPassengersPanel['y'] = 1 * (AirPassengersPanel['trend'] % 12) < 2
# above causes errors due to being boolean
AirPassengersPanel['y'] = np.where(1 * (AirPassengersPanel['trend'] % 12) < 2, 1, 0)
Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test

print(Y_train_df.head())
print(Y_train_df.describe())

model = MLP(h=12,
            input_size=24,
            loss=DistributionLoss(distribution='Bernoulli', level=[80, 90], return_params=True),
            valid_loss=Accuracy(),
            stat_exog_list=['airline1'],
            scaler_type='robust',
            max_steps=200,
            early_stop_patience_steps=2,
            val_check_steps=10,
            learning_rate=1e-3)

fcst = NeuralForecast(models=[model], freq='M')
# this gets to _compute_valid_loss() -> _inv_normalization() and fails with: 'tuple' object has no attribute 'ndim'
fcst.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
forecasts = fcst.predict(futr_df=Y_test_df)

# Plot quantile predictions
Y_hat_df = forecasts.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])

plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['MLP-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:], 
                 y1=plot_df['MLP-lo-90'][-12:].values, 
                 y2=plot_df['MLP-hi-90'][-12:].values,
                 alpha=0.4, label='level 90')
plt.legend()
plt.grid()
plt.plot()
plt.show(Block=True)

Issue Severity

High: It blocks me from completing my task.

jmoralez commented 1 year ago

Hey @tg2k, thanks for using neuralforecast. The data is converted to long format, there are 100 images of 64 pixels, which become 100 unique_ids with 64 timestamps each.

I believe the problem here is that pixels isn't defined as an exogenous feature and thus isn't being used. By adding:

hist_exog_list=['pixels'],
futr_exog_list=['pixels'],

to both models I get:

MLP Accuracy: 94.3%
NHITS Accuracy: 97.3%

Which shows that adding the pixel values helps the models. Please let us know if this helps.

tg2k commented 1 year ago

Thanks @jmoralez. That's helpful, though there seems to be more to it. This technique doesn't help with the AirPassengers example. What I found is that if I call fit(), I get the 'tuple' object has no attribute 'ndim' error. If I change the code to use cross_validation(), it works. The issue seems to be triggered by specifying the val_size parameter (if I don't pass it, then fit() runs through). Then, when the code reaches _compute_valid_loss() it hits this block:

        # Validation Loss evaluation
        if self.valid_loss.is_distribution_output:
            valid_loss = self.valid_loss(
                y=outsample_y, distr_args=distr_args, mask=outsample_mask
            )
        else:
            output, _, _ = self._inv_normalization(
                y_hat=output, temporal_cols=temporal_cols
            )
            valid_loss = self.valid_loss(
                y=outsample_y, y_hat=output, mask=outsample_mask
            )
        return valid_loss

The Accuracy() loss function sets is_distribution_output = false, causing the _inv_normalization() call and y_hat is a Tuple when it expects the ndim attribute. I'm unclear on whether the call should be using outsample_y instead of output, but I tried this in some earlier debugging against my own data, ran into a downstream issue, and stopped there.

tg2k commented 1 year ago

I'm also finding that with some of my own code, it seems that using the Auto models may trigger the val_size check even if not specified. This may be because BaseAuto has this code:

val_size = val_size if val_size > 0 else self.h

This complicates attempts to work around the error.

It looks as though the code block I posted yesterday is not compatible with the valid_loss=Accuracy() code, but perhaps this goes undetected if not using an Auto model and not specifying val_size?

Nixtla / neuralforecast