Open HeYiyang2 opened 12 months ago
Hi @HeYiyang2 apologies for the delayed response. How did you download the local dataset? I think load_from_disk
should only be used in cases where the directory is created as a result of a call to save_to_disk
. See e.g. this comment
Due to the large size of the ImageNet dataset, I am using the MiniImageNet dataset. I modified the YAML file accordingly. datasets: target: flava.definitions.TrainingDatasetsInfo selected:
text image: target: flava.definitions.TrainingSingleDatasetInfo train:
target: flava.definitions.HFDatasetInfo key: mini_val subset: default data_dir: >- /home/liumaofu/hyy/multimodal/examples/flava/mini/ok/val/ At the same time, I modified the examples/flava/data/utils. py file: def build_datasets_from_info(dataset_infos: List[HFDatasetInfo], split: str = "train"): dataset_list = [] for dataset_info in dataset_infos: print(f"Loading dataset from {dataset_info.data_dir}")
current_dataset = load_from_disk(dataset_info.data_dir)
if dataset_info.remove_columns is not None: current_dataset = current_dataset.remove_columns(dataset_info.remove_columns) if dataset_info.rename_columns is not None: for rename in dataset_info.rename_columns: current_dataset = current_dataset.rename_column(rename[0], rename[1])
dataset_list.append(current_dataset)
return concatenate_datasets(dataset_list) However, when executing the code:python -m flava.train config=flava/configs/pretraining/debug.yaml , an error is reported:Directory /home/liumaofu/hyy/multimodal/examples/flava/mini/ok/train/ is neither a dataset directory nor a dataset dict directory. The structure of my miniimagenet dataset is as follows: miniImagenet |-- train | |-- class1 | | |-- image1.jpg | | |-- image2.jpg | | |-- ... | |-- class2 | | |-- image1.jpg | | |-- image2.jpg | | |-- ... | |-- ... |-- val | |-- class1 | | |-- image1.jpg | | |-- image2.jpg | | |-- ... | |-- class2 | | |-- image1.jpg | | |-- image2.jpg | | |-- ... | |-- ... |-- test | |-- class1 | | |-- image1.jpg | | |-- image2.jpg | | |-- ... | |-- class2 | | |-- image1.jpg | | |-- image2.jpg | | |-- ... | |-- .. I ensure that their storage path is not a problem. May I ask why this error is reported and what should I do?