Closed azuryl closed 2 weeks ago
Hi @azuryl, thanks for trying our code.
The C4 dataset is not always stablely accessible in our region (China). So, I downloaded it directly onto my local disk and loaded it with the load_dataset function. You can either directly access huggingface by:
traindata = load_dataset('allenai/c4', 'allenai--c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train')
valdata = load_dataset('allenai/c4', 'allenai--c4', data_files={'validation': 'en/c4-validation.00000-of-00008.json.gz'}, split='validation')
or download the dataset to your local disk and revise the code here: traindata = load_dataset('../../c4', data_files={'train': 'c4-train.00000-of-01024.json'}, split='train') to the location of your c4 dataset.
Dear @biomedical-cybernetics @silence1024 Thanks for your great job!
I want to run your work, but meet some issue raceback (most recent call last): File "/home/delight-gpu/Workspace2/azuryl/Relative-importance-and-activation-pruning/main.py", line 151, in
main()
File "/home/delight-gpu/Workspace2/azuryl/Relative-importance-and-activation-pruning/main.py", line 92, in main
prune_ria(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)
File "/home/delight-gpu/Workspace2/azuryl/Relative-importance-and-activation-pruning/lib/prune.py", line 179, in pruneria
dataloader, = get_loaders(args.calib_dataset,nsamples=args.nsamples,seed=args.seed,seqlen=args.seqlen,tokenizer=tokenizer)
File "/home/delight-gpu/Workspace2/azuryl/Relative-importance-and-activation-pruning/lib/data.py", line 120, in get_loaders
return get_c4(nsamples, seed, seqlen, tokenizer)
File "/home/delight-gpu/Workspace2/azuryl/Relative-importance-and-activation-pruning/lib/data.py", line 87, in get_c4
traindata = load_dataset('../../c4', data_files={'train': 'c4-train.00000-of-01024.json'}, split='train')
File "/home/azuryl/anaconda3/envs/prune_ria/lib/python3.10/site-packages/datasets/load.py", line 2523, in load_dataset
builder_instance = load_dataset_builder(
File "/home/azuryl/anaconda3/envs/prune_ria/lib/python3.10/site-packages/datasets/load.py", line 2195, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/azuryl/anaconda3/envs/prune_ria/lib/python3.10/site-packages/datasets/load.py", line 1848, in dataset_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /home/delight-gpu/Workspace2/c4/c4.py or any data file in the same directory.