ZGC-LLM-Safety / TrafficLLM

The repository of TrafficLLM, a universal LLM adaptation framework to learn robust traffic representation for all open-sourced LLM in real-world scenarios and enhance the generalization across diverse traffic analysis tasks.
45 stars 8 forks source link

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported #5

Open ReamonYim opened 4 days ago

ReamonYim commented 4 days ago

Problem Description:

While running the dual-stage-tuning training script trafficllm_stage1.sh, I encountered the following error during the dataset loading phase:

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

This error occurred when calling the load_dataset function in the datasets library, indicating that it does not support loading datasets cached using the local file system.

Attempts and Outcomes:

I have tried several approaches to resolve this issue, but unfortunately, none of them have worked. Below is a summary of my efforts and their corresponding results:

  1. Clearing the cache Directory:

    • I tried clearing the dual-stage-tuning/cache directory to eliminate any potential issues caused by corrupted or incompatible cache files.
    • Outcome: The problem persisted, and I continued getting the NotImplementedError.
  2. Changing the cache_dir to Different Directories (e.g., /root/.cache/huggingface/datasets):

    • I modified the main.py script to set the cache_dir parameter to ~/.cache/huggingface and /cache, hoping that changing the cache directory might resolve the file system compatibility issue.
    • Outcome: No change—the same error still appeared.
  3. Forcing LocalFileSystem Usage:

    • I added fs=LocalFileSystem() in main.py and passed the fs parameter to the load_dataset call to explicitly enforce the use of the local file system.
    • Outcome: This resulted in a new error: TypeError: __init__() got an unexpected keyword argument 'fs', preventing the program from continuing.
  4. Removing the fs Parameter:

    • Since adding the fs parameter led to another issue, I removed it to restore the original code.
    • Outcome: The original NotImplementedError reappeared, indicating that the core issue remained unresolved.
  5. Changing tmp Cache Directory:

    • I tried pointing the cache_dir to a temporary directory (/tmp/hf_cache), manually created this directory, and specified it in the script.
    • Outcome: Although the cache directory was successfully created, the same LocalFileSystem error persisted.
  6. Misconfiguration Leading to FileExistsError:

    • At one point, I mistakenly configured certain paths (e.g., /tmp) to /dev/null, resulting in FileExistsError: [Errno 17] File exists: '/dev/null'.
    • Outcome: After correcting this configuration, the error was resolved, but it did not help address the main issue.
  7. Attempting to Modify the datasets Source Code:

    • I tried manually editing the datasets library source code to remove the fs parameter in builder.py, but this approach could potentially break other dependencies within datasets.
    • Outcome: Due to the high risk, I did not proceed with this method.

Request for Help:

I would greatly appreciate any guidance or suggestions on how to resolve this LocalFileSystem issue. It is currently preventing me from loading the dataset and proceeding with the model training. Your assistance would mean a lot, and I’m eager to hear any potential solutions or workarounds.

Thank you so much in advance for your help and support!

CuiTianyu961030 commented 3 days ago

This may be because your script trafficllm_stage1.sh has no permission to access the /cache path in the config of cache_dir. You can try to change the cache_dir in the trafficllm_stage1.sh (not in the main.py) to solve the problem. For instance, change the path /cache to ../cache in the config of cache_dir.

Alternatively, you can also try deleting the cache_dir field in trafficllm_stage1.sh to store the cache in the default path. Refer to ChatGLM2's config.

I hope the above solutions work for you.