Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

Cannot find file when using custom data in language modeling #202

Closed BramVanroy closed 2 years ago

BramVanroy commented 2 years ago

🐛 Bug

I am trying to run a simple LM task with my own data. I use the following command as taken from here: https://lightning-transformers.readthedocs.io/en/latest/tasks/nlp/language_modeling.html#language-modeling-using-your-own-files

> cd /home/bram/pl-trf/data
> pl-transformers-train task=nlp/language_modeling dataset.cfg.train_file=train.csv dataset.cfg.validation_file=dev.csv

which leads to the following error

Error executing job with overrides: ['task=nlp/language_modeling', 'dataset.cfg.train_file=train.csv', 'dataset.cfg.validation_file=dev.csv']
Traceback (most recent call last):
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/lightning_transformers/cli/train.py", line 84, in hydra_entry
    main(cfg)
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/lightning_transformers/cli/train.py", line 70, in main
    run(
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/lightning_transformers/cli/train.py", line 53, in run
    data_module.setup("fit")
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/pytorch_lightning/core/datamodule.py", line 428, in wrapped_fn
    fn(*args, **kwargs)
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/lightning_transformers/core/nlp/data.py", line 31, in setup
    dataset = self.load_dataset()
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/lightning_transformers/core/nlp/data.py", line 67, in load_dataset
    return load_dataset(extension, data_files=data_files)
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/datasets/load.py", line 1084, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/datasets/load.py", line 948, in load_dataset_builder
    data_files = _resolve_data_files_locally_or_by_urls(".", data_files)
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/datasets/load.py", line 269, in _resolve_data_files_locally_or_by_urls
    return {
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/datasets/load.py", line 270, in <dictcomp>
    k: _resolve_data_files_locally_or_by_urls(base_path, v, allowed_extensions=allowed_extensions)
  File "/home/bram/.local/share/virtualenvs/pl-trf-aNvOnYab/lib/python3.9/site-packages/datasets/load.py", line 266, in _resolve_data_files_locally_or_by_urls
    raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to resolve any data file that matches 'train.csv' at /home/bram/pl-trf/data/outputs/2021-10-13/20-53-59

Clearly the library is looking for the dataset file in a different directory (/home/bram/pl-trf/data/outputs/2021-10-13/20-53-59) but I don't quite understand why and how I can change that behavior.

Environment

SeanNaren commented 2 years ago

Hmm this is probably due to the way hydra is interpolating the arguments. Could you try passing in absolute paths?

DrMatters commented 2 years ago

Hmm this is probably due to the way hydra is interpolating the arguments. Could you try passing in absolute paths?

That's because hydra changes the working directory to "outputs/2021-10-13/20-53-59" (for experiments). But passing an absolute path surely can help

BramVanroy commented 2 years ago

That does seem to work indeed. Perhaps this can be documented somewhere? Perhaps here? https://lightning-transformers.readthedocs.io/en/latest/datasets/custom_data.html

mathemusician commented 2 years ago

@Borda I can work on this if @SeanNaren is busy. It's just adding documentation

Borda commented 2 years ago

@Borda I can work on this if @SeanNaren is busy. It's just adding documentation

that would be cool, thx!