Closed dasemiao closed 11 months ago
Can you provide more details to let me reproduce the error?
I suggest that I have solved your problem. I try to generate a custom dataset following the format that the author given.However when I try to train a LongMem model.I meet an error "Is a directory: 'XXX/longmem/valid' ".I think the reason is that when the author writes the code, the version of fairseq is low, and valid and test binaries are not required to run.Up to now,I run this code by the fairseq version is 0.12 and you need to find the code under the fairseq subfolder like "xxx/longmem/fairseq/fairseq_cli/train.py" and just comment the code
# Load valid dataset (we load training data below, based on the latest checkpoint)
# We load the valid dataset AFTER building the model
data_utils.raise_if_valid_subsets_unintentionally_ignored(cfg)
if cfg.dataset.combine_valid_subsets:
task.load_dataset("valid", combine=True, epoch=1)
else:
for valid_sub_split in cfg.dataset.valid_subset.split(","):
task.load_dataset(valid_sub_split, combine=False, epoch=1)
In the official code,It's on lines 128 through 133.
I made a pile dataset, but how to divide the valid dataset. For my self-made validation set, I always get the error "Is a directory: '/home/mdz/pywork/LongMem/pile_preprocessed_binary/valid'