Open AlvinLok opened 4 months ago
It looks like you're using Windows. We haven't really tested this codebase on windows. Could you try the following?
dataloader_num_workers
to 0.adamw_torch_fused
(try adamw_torch
). @AlvinLok any update on this?
Yes, I'm on Windows. I've made the changes, and now it's a new error:
Traceback (most recent call last):
File "C:\Users\alvinlok\xxx\03 Code\chronos-forecasting\scripts\training[train.py](http://train.py/)", line 694, in
For this, convert_to_arrow('data.arrow', df.VALUE, df.REF_DATE)
, can you share how your dataframe looks like?
This is what my df looks like:
REF_DATE VALUE
0 2010-01-01 84.7 1 2010-02-01 85.3 2 2010-03-01 85.4 3 2010-04-01 85.8 4 2010-05-01 86.8
convert_to_arrow( path="arrow_files/p32_df_train.arrow", time_series=p32_df_train.VALUE, start_times=p32_df_train.REF_DATE, )
@AlvinLok could you check if the fix proposed in #156 makes it work for you?
no, adding freeze_support()
did not have any effect. I am getting the same error: Array 'target' has bad shape - expected 1 dimensions, got 0.
@lostella this one is unrelated. @AlvinLok you're transforming the data incorrectly. Please check the type signature of the function that you're using to transform. convert_to_arrow
expects
...
time_series: Union[List[np.ndarray], np.ndarray],
start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
...
The first one is a list of 1-D numpy arrays (i.e., a list of time series). The second one is a list of np.datetime64
, i.e., a list of start times, one for each time series in the first list. Since we're only using the start_times
, time series are expected to be uniformly-spaced.
@AvisP: can you also check that you're transforming the data correctly?
@abdulfatir That is highly unlinkely, as I am not using any custom data but generated ones using the provided script and code in example. Are there any likely issues that may happen generating data with python kernel-synth.py --num-series 20 --max-kernels 5
and with the following script? Here are the datafiles that I am using to download and verify
from pathlib import Path
from typing import List, Optional, Union
import numpy as np
from gluonts.dataset.arrow import ArrowWriter
def convert_to_arrow(
path: Union[str, Path],
time_series: Union[List[np.ndarray], np.ndarray],
start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
compression: str = "lz4",
):
if start_times is None:
# Set an arbitrary start time
start_times = [np.datetime64("2000-01-01 00:00", "s")] * len(time_series)
assert len(time_series) == len(start_times)
dataset = [
{"start": start, "target": ts} for ts, start in zip(time_series, start_times)
]
ArrowWriter(compression=compression).write_to_file(
dataset,
path=path,
)
if __name__ == "__main__":
# Generate 20 random time series of length 1024
time_series = [np.random.randn(1024) for i in range(20)]
# Convert to GluonTS arrow format
convert_to_arrow("./noise-data.arrow", time_series=time_series)
@lostella this one is unrelated. @AlvinLok you're transforming the data incorrectly. Please check the type signature of the function that you're using to transform.
convert_to_arrow
expects... time_series: Union[List[np.ndarray], np.ndarray], start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None, ...
The first one is a list of 1-D numpy arrays (i.e., a list of time series). The second one is a list of
np.datetime64
, i.e., a list of start times, one for each time series in the first list. Since we're only using thestart_times
, time series are expected to be uniformly-spaced.
Alright, well I converted it to a numpy array, and removed the start times argument but received the same error
time_series_data = p32_df_train.VALUE.to_numpy() path = "arrow_files/p32_df_train.arrow"
convert_to_arrow( path=path, time_series=time_series_data )
Error:
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\dataset\common.py", line 345, in call raise GluonTSDataError( gluonts.exceptions.GluonTSDataError: Array 'target' has bad shape - expected 1 dimensions, got 0. 0%| | 0/1000 [00:00<?, ?it/s]
@AlvinLok It looks like you're passing a single series to the function. You need to pass a list of time series. If you only have a single series, pass it as [time_series_data]
.
Describe the bug TypeError caused by EOFError when loading pickle file through ForkingPickler:
2024-07-10 10:46:29,886 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - TF32 format is only available on devices with compute capability >= 8. Setting tf32 to False. 2024-07-10 10:46:29,893 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Using SEED: 1360904892 2024-07-10 10:46:29,958 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Logging dir: output\run-1 2024-07-10 10:46:29,961 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Loading and filtering 1 datasets for training: ['data.arrow'] 2024-07-10 10:46:29,962 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Mixing probabilities: [1] 2024-07-10 10:46:30,642 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Initializing model 2024-07-10 10:46:30,642 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Using pretrained initialization from amazon/chronos-t5-small The speedups for torchdynamo mostly come wih GPU Ampere or higher and which is not detected here. max_steps is given, it will override any value given in num_train_epochs 2024-07-10 10:46:45,054 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Training 0%| | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last): File "C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py", line 692, in
app()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\main.py", line 326, in call
raise e
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\main.py", line 309, in call
return get_command(self)(*args, kwargs)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\click\core.py", line 1157, in call
return self.main(args, kwargs)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\core.py", line 661, in main
return _main(
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\core.py", line 193, in _main
rv = self.invoke(ctx)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\click\core.py", line 783, in invoke
return __callback(args, kwargs)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\main.py", line 692, in wrapper
return callback(*use_params)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer_config\decorators.py", line 92, in wrapped
return cmd(args, **kwargs)
File "C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py", line 679, in main
trainer.train()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\transformers\trainer.py", line 1932, in train
return inner_training_loop(
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\transformers\trainer.py", line 2230, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\accelerate\data_loader.py", line 671, in iter
main_iterator = super().iter()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter
return self._get_iterator()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\torch\utils\data\dataloader.py", line 1040, in init
w.start()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "", line 2, in pyarrow.lib._RecordBatchFileReader.__reduce_cython
TypeError: no default reduce due to non-trivial cinit__
0%|
(chronos) C:\Users\alvin\OneDrive\Coding\Python\chronos>Traceback (most recent call last): File "", line 1, in
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Occurs when attempting to fine tune the model: python chronos-forecasting/scripts/training/train.py --config chronos-forecasting/scripts/training/configs/chronos-t5-small.yaml --model-id amazon/chronos-t5-small --no-random-init --max-steps 1000 --learning-rate 0.001
Steps taken:
fused=True
requires all the params to be floating point Tensors of supported devices: ['cuda', 'xpu', 'privateuseone'], so I did pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Environment description Operating system: Python version: 3.10.14 CUDA version: 12.2 PyTorch version: 2.3.1+cu121 HuggingFace transformers version: 4.42.3 HuggingFace accelerate version: 0.32.1
Any help is appreciated, I have tried this with multiple fresh conda environments on different machines