Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
711 stars 87 forks source link

Where is CommonsenseConversation/test.jsonl ? When I run train. sh and then run run_decode_solver. sh or run_decode. sh, I always can't find test.jsonl #81

Closed Humble2967738843 closed 4 months ago

Humble2967738843 commented 4 months ago

github_projects/DiffuSeq_v2/DiffuSeq/diffusion_models/diffuseq_cc_h128_lr0.0001_t2000_sqrt_lossaware_seed102_learned_mask_fp16_denoise_0.5_reproduce20240419-12:55:19/training_args.json

Creating model and diffusion...

########## 0

The parameter count is 91225402

reload the random embeddings Embedding(30522, 128)

Sampling...on test

############################## Loading text data... ############################## Loading dataset cc from github_projects/DiffuSeq_v2/DiffuSeq/datasets/CommonsenseConversation...

Loading form the TEST set...

Traceback (most recent call last): File "sample_seq2seq_dpmSolver.py", line 223, in main() File "sample_seq2seq_dpmSolver.py", line 87, in main data_valid = load_data_text( File "github_projects/DiffuSeq_v2/DiffuSeq/diffuseq/text_datasets.py", line 45, in load_data_text training_data = get_corpus(data_args, seq_len, split=split, loaded_vocab=loaded_vocab) File "github_projects/DiffuSeq_v2/DiffuSeq/diffuseq/text_datasets.py", line 238, in get_corpus with open(path, 'r') as f_reader: FileNotFoundError: [Errno 2] No such file or directory: 'github_projects/DiffuSeq_v2/DiffuSeq/datasets/CommonsenseConversation/test.jsonl'

summmeer commented 4 months ago

The file in the folder is just the Toy Example. You need to download the whole datasets. (links are provided in the README)

Humble2967738843 commented 4 months ago

Oh sorry, thank you. I thought this dataset was a combination of validation and testing datasets...