Fix forecasting lead times and improve forecasting functionality

This PR improves forecasting functionality in DeepSensor:

Fixes a bug where models set up with a TaskLoader with target_delta_t containing values >0 would have model.predict only return the longest lead time (if there ar multiple lead times).
Previously, the time dim in model.predict forecasts would only correspond to the initialisation time, which is confusing. This PR adds an init_time and lead_time dim alongside a time coordinate (which is the valid time of the forecast, which is more intuitive and enables comparison with non-forecast ground truth with a single time dim).
Adds a convenient deepsensor.eval.metrics.compute_errors method which takes the Prediction returned by model.predict and computes the differences between the "mean" entry of model predictions with those in a target dataset for each variable in the Prediction, returning an xarray.Dataset of errors from which mean statistics can be computed.

Google Colab demonstrator here.

Notes:

Forecasting model.predict only works if the target variables are the same for each target set. If this isn't the case, model.predict raises an error. This is primarily for implementation simplicity.
This PR is a bit hacky. It detects if any target_delta_t > 0, and then does some slightly ugly conditional logic to return predictions with forecasting dimensions. I prioritised getting unit tests working than doing something more elegant.

alan-turing-institute / deepsensor