Open harshitv804 opened 8 months ago
Keeping this open for visibility, since others may have the same question
Can chronos take multiple inputs (channels) but make predictions on a single one of them?
I have pushed a data of size: (n_features, samples) and it makes predictions on one of them. However, it seems like I cannot choose the feature that it is making predictions on. Is there a way to choose it?
Thanks
@ozanbarism if I understand your question right, you want to provide covariates: this is not possible, see #22.
I have pushed a data of size: (n_features, samples) and it makes predictions on one of them.
I'm not sure what you mean here: don't you get predictions for all of them? That's what should happen
I do not get predictions for all of them. I get predictions for one of them it seems like. Also, there is a number of samples term, is this the length of the context data we provide?
This is what it looks like for a univariate data. And this is the case where i push multivariate data. as you can see it still returns a single prediction column.
this is my code
model = ChronosModel(name = "amazon/chronos-t5-small", device = "cpu") duration = 20 # in hours pred_hrz = 2 sampling_rates=[300] for i, sr in enumerate(sampling_rates):
Parameter = ParameterGenerator('OfficeSmall', 'Hot_Dry', 'Tucson', max_power=max_power, time_reso=control_rate) # Description of ParameterGenerator in bldg_utils.py
data, gt = building_simulate(Parameter, room_id, duration, pred_hrz, control_rate,
sr, T_cool, T_heat, mode, hysteresis_margin, single_variate=False, make_plot=False, show_outdoor=False)
pred_len = int(pred_hrz*3600/sr)
low, forecast, high = model(data, prediction_length=pred_len, num_samples=1)
plot_pred(data, forecast, gt, forecast_index=None)
print('MSE {:.4f}'.format(np.mean((forecast-gt[:,0])**2)))
and this is how i defined the chronosmodel class
class ChronosModel:
def __init__(self, name, device="cuda"):
from chronos import ChronosPipeline
self.model = ChronosPipeline.from_pretrained(
name,
device_map=device, # use "cpu" for CPU inference and "mps" for Apple Silicon
torch_dtype=torch.bfloat16,
)
def __call__(self, data, prediction_length, num_samples=1):
if not torch.is_tensor(data):
_data = torch.tensor(data)
else:
_data = data
forecast = self.model.predict(
context=_data,
prediction_length=prediction_length,
num_samples=num_samples,
)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
return low, median, high # 80% interval
Also, there is a number of samples term, is this the length of the context data we provide?
From the documentation:
Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context
So you will get more than one output (multiple future trajectories) equal to the num_samples
. If you only want one prediction you can set to 1, but most users will take the median of the default 20 as your code does with np.quantile().
The code example is a bit difficult to follow (why add a model wrapper here?) I suspect that you're only getting one prediction because you set the 0th forecast to always be output with forecast[0]
.
From the plots though, each blue line can be thought of as an independent univariate time series such as a collection of independent weather stations collection temperature data. This model can't take in traditional model "features", but can predict on each univariate series in parallel based only off the historical. So when you add in several "features" like that (T_cool, T_heat etc.) it could predict the next values of each "features" but not use them to inform a target variable as @lostella mentioned.
To get each prediction you would need to loop through the number of univariate time series that you have with something like:
for i in range(num_of_series):
forecast[i].median()
Hope this helps.
@harshitv804 I am working on extending chronos to add covariates using an lgbm regression head on top of univariate embeddings If you want to assist me to progress on this solution I would really appreciate it https://github.com/autogluon/autogluon/pull/4278
If you have specific multivariate use cases/datasets to share with us, please do. It will helpful for us to understand the types of practical multivariate problems.
@abdulfatir The multivariate use case I have is to forecast the open, high, low, and close of an asset in the financial markets aka candlestick charts. In this case, I don't think forecasting on the individual dimensions independently is ideal, since in a given timestep, since there is a dependent relationship between the dimensions.
@hsm207 Do you want to forecast all 3 variables future based on all 3 variables past value? Did you try https://huggingface.co/Salesforce/moirai-1.0-R-large?
Do you want to forecast all 3 variables future based on all 3 variables past value?
@ikvision yes, I want to forecast all 4 variables (open, high, low, close) based on the 4 variable's past value.
Did you try https://huggingface.co/Salesforce/moirai-1.0-R-large?
I have not. Thanks for sharing! I was not aware of this paper before. From the abstract, it looks like it will help.
Hi, guys. Thanks for your discussion. I got some useful info, cool. In my case, I have the medical data of different vital signs for multiple patients. These are multivariate time series data. The multivariate part comes from different measurement items, like PH, SpO2, Urine Output, etc total 12 item.
For example, for the 5000 samples. We will have the data(ndarray) shape (5000, 12, 200). 12 features over 200 times steps. The dataset please check the output of this notebook https://www.kaggle.com/code/wangyuweikiwi/mimi-iii-time-series-data-preprocessing
@harshitv804 as we discussed in the paper, Chronos currently focuses on univariate forecasting. For multivariate time series, you might want to use Chronos on the individual dimensions independently. If you have specific multivariate use cases/datasets to share with us, please do. It will helpful for us to understand the types of practical multivariate problems.