amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
https://arxiv.org/abs/2403.07815
Apache License 2.0
2.02k stars 238 forks source link

Fine-tuning Chronos with number of worker>=1 #131

Closed Hstellar closed 3 days ago

Hstellar commented 4 days ago

While running chronos-forecasting/scripts/training/train.py for dataset generated from chronos-forecasting/scripts/kernel-synth.py, dataloader_num_workers =1 or greater
causes TypeError: no default __reduce__ due to non-trivial __cinit__.

If this is commented from training_args it works fine.

pyarrow version used was '8.0.0'. I wanted to check what version of pyarrow was used for fine-tuning or what could possibly cause this error. Thank you in advance!

Here is the full traceback of error:

Traceback (most recent call last):
  File "/lustre/orion/csc605/scratch/hstellar/chronos-forecasting/scripts/training/fine_tune.py", line 1240, in <module>
    app()
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/typer/main.py", line 328, in __call__
    raise e
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/typer/core.py", line 721, in main
    return _main(
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/typer/core.py", line 225, in _main
    rv = self.invoke(ctx)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/typer_config/decorators.py", line 92, in wrapped
    return cmd(*args, **kwargs)
  File "/lustre/orion/csc605/scratch/hstellar/chronos-forecasting/scripts/training/fine_tune.py", line 1212, in main
    for batch in train_dataloader:
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 386, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1039, in __init__
    w.start()
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/lustre/orion/csc605/scratch/hstellar/miniconda/envs/tn/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "stringsource", line 2, in pyarrow.lib._RecordBatchFileReader.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__
abdulfatir commented 3 days ago

It works for me. My arrow version is pyarrow==16.1.0. Please make sure that you're installing training dependencies correctly as described in the README, preferably in a fresh env.

Note: With the way datasets are setup currently, I wouldn't recommend using dataloader-num-workers>1 if you only have a few datasets.

Hstellar commented 3 days ago

Thank you! I was able to run with pyarrow==16.1.0 and with training dependencies in fresh env.