Closed CoCoNuTeK closed 1 week ago
@CoCoNuTeK The models were trained with tf32
(a 19-bit CUDA floating point format that's a replacement for fp32
). We recommend bf16
for inference, especially if your machine supports that. It should require less memory and be much faster that fp32
. Please note that we are talking about the model's parameters (torch_dtype
in the pipeline) here. DO NOT cast your time series into bf16
as that may result in loss of information.
@CoCoNuTeK The models were trained with
tf32
(a 19-bit CUDA floating point format that's a replacement forfp32
). We recommendbf16
for inference, especially if your machine supports that. It should require less memory and be much faster thatfp32
. Please note that we are talking about the model's parameters (torch_dtype
in the pipeline) here. DO NOT cast your time series intobf16
as that may result in loss of information.
Ah, okay so i just keep my datapoints in format as they are, so if its stock data, i just feed them in as is, thanks for the info. And for the finetuning part should I use bf16 aswell?
For fine-tuning, the recommended settings are in the training script which uses tf32
for training. Of course, you're free to experiment with other dtypes and hyperparameters.
P.S.: I don't want to constrain your creativity but please be mindful when applying a univariate pretrained model such as Chronos to stock data, which is often heavily influenced by external factors. :)
For fine-tuning, the recommended settings are in the training script which uses
tf32
for training. Of course, you're free to experiment with other dtypes and hyperparameters.P.S.: I don't want to constrain your creativity but please be mindful when applying a univariate pretrained model such as Chronos to stock data, which is often heavily influenced by external factors. :)
I mean long term predictions for sure, but some day trading stuff could work if i try 1 tick = 5mins lets say it could find interesting stuff hopefully, i will let you know if you want.
Hello there, what would you recommend as the best torch_dtype param?? Given the tradeoffs?? Or was the model trained only using the bfloat16?? Thanks for the answer.