Open khabalghoul opened 10 months ago
Hi @tomyjara!
You can use something like this to build a custom dataset.
Create a JSON lines file with your time series data. Basically every line has one time series in JSON format with two keys, start
(the start time stamp) and target
(the actual time series). I have attached an example file. Note that the time series are not required to have the same start or length.
Use this function to load the file as a GluonTS dataset.
from pathlib import Path
from gluonts.dataset.split import split
from gluonts.dataset.common import (
MetaData,
TrainDatasets,
FileDataset,
)
def get_custom_dataset(
jsonl_path: Path,
freq: str,
prediction_length: int,
split_offset: int = None,
):
"""Creates a custom GluonTS dataset from a JSONLines file and
give parameters.
Parameters
----------
jsonl_path
Path to a JSONLines file with time series
freq
Frequency in pandas format
(e.g., `H` for hourly, `D` for daily)
prediction_length
Prediction length
split_offset, optional
Offset to split data into train and test sets, by default None
Returns
-------
A gluonts dataset
"""
if split_offset is None:
split_offset = -prediction_length
metadata = MetaData(freq=freq, prediction_length=prediction_length)
test_ts = FileDataset(jsonl_path, freq)
train_ts, _ = split(test_ts, offset=split_offset)
dataset = TrainDatasets(metadata=metadata, train=train_ts, test=test_ts)
return dataset
get_custom_dataset
can be used as a replacement for https://github.com/amazon-science/unconditional-time-series-diffusion/blob/50f52da1c583d2eece4da8e933f34b73dc249a75/bin/train_model.py#L135Thanks @marcelkollovieh for helping with the response!
tsdiff) rrr@rr:~/unconditional-time-series-diffusion$ python bin/train_model.py -c configs/train_fdr.yaml
DEBUG:root:Before importing pykeops...
DEBUG:root:After importing pykeops!
INFO:uncond_ts_diff.arch.s4:Pykeops installation found.
WARNING: Skipping key sampler_params!
WARNING:root:Cannot infer loader for /home/h/unconditional-time-series-diffusion/data/fdr/CAS/dummy_custom_data.json:Zone.Identifier.
WARNING:root:Cannot infer loader for /home/h/unconditional-time-series-diffusion/data/fdr/CAS/train - 副本.json:Zone.Identifier.
WARNING:root:Cannot infer loader for /home/h/unconditional-time-series-diffusion/data/fdr/CAS/dummy_custom_data.json:Zone.Identifier.
WARNING:root:Cannot infer loader for /home/h/unconditional-time-series-diffusion/data/fdr/CAS/train - 副本.json:Zone.Identifier.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
INFO:bin/train_model.py:Logging to ./lightning_logs/version_44
/home/h/anaconda3/envs/tsdiff/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py:108: PossibleUserWarning: You defined a validation_step
but have no val_dataloader
. Skipping val loop.
rank_zero_warn(
You are using a CUDA device ('NVIDIA GeForce RTX 4060 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high')
which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃
┡━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ scaler │ MeanScaler │ 0 │
│ 1 │ embedder │ FeatureEmbedder │ 1 │
│ 2 │ backbone │ BackboneModel │ 193 K │
└───┴──────────┴─────────────────┴────────┘
Trainable params: 193 K
Non-trainable params: 0
Total params: 193 K
Total estimated model params size (MB): 0
DEBUG:fsspec.local:open file: /home/h/unconditional-time-series-diffusion/lightning_logs/version_44/hparams.yaml
DEBUG:fsspec.local:open file: /home/h/unconditional-time-series-diffusion/lightning_logs/version_44/hparams.yaml
Epoch 0/99 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/-- 0:00:00 • -:--:-- 0.00it/s
Traceback (most recent call last):
File "/home/h/anaconda3/envs/tsdiff/lib/python3.8/site-packages/gluonts/dataset/jsonl.py", line 127, in iter
yield json.loads(line)
orjson.JSONDecodeError: unexpected end of data: line 2 column 1 (char 3)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bin/train_model.py", line 286, in
I've used dummy json as train.json, but the error comes out as above
Hi! How are you?
I found that tsdiff could be a great tool for generating eeg data. I have a dataset containing the channels measurements from an eeg obtained in an experiment and I would like to train your model with this data. How should I do in order to train your model with a custom dataset?
Thanks!