facebookresearch / DomainBed

DomainBed is a suite to test domain generalization algorithms
MIT License
1.4k stars 298 forks source link

如何解决数据集camelyon17 #151

Open yuu-Wang opened 3 months ago

yuu-Wang commented 3 months ago

您好,要是想用Camelyon17 ,该怎么编排数据集的结构,DomainBed/domainbed/data/camelyon17/是这样吗,但是它一直显示没找到。Traceback (most recent call last): File "/root/wangxy/AlignClip/main.py", line 155, in main(args) File "/root/wangxy/AlignClip/main.py", line 44, in main train_iter, val_loader, test_loaders, train_class_names, template = get_dataset(args) File "/root/wangxy/AlignClip/engine.py", line 59, in get_dataset converter_domainbed.get_domainbed_datasets(dataset_name=args.data, root=args.root, targets=args.targets, File "/root/wangxy/AlignClip/converter_domainbed.py", line 21, in get_domainbed_datasets datasets = vars(dbdatasets)[dataset_name](root, targets, hparams) File "/root/wangxy/AlignClip/DomainBed/domainbed/datasets.py", line 347, in init dataset = Camelyon17Dataset(root_dir=root) File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/camelyon17_dataset.py", line 64, in init self._data_dir = self.initialize_data_dir(root_dir, download) File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/wilds_dataset.py", line 341, in initialize_data_dir self.download_dataset(data_dir, download) File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/wilds_dataset.py", line 368, in download_dataset raise FileNotFoundError( FileNotFoundError: The camelyon17 dataset could not be found in DomainBed/domainbed/data/camelyon17_v1.0. Initialize the dataset with download=True to download the dataset. If you are using the example script, run with --download. This might take some time for large datasets.

piotr-teterwak commented 3 months ago

Hi, unfortunately I'm an English speaker, but it looks like you're having issues using Camelyon17 because it is not downloaded?

I woudl run this script, with line 304 uncommented to download the dataset: https://github.com/facebookresearch/DomainBed/blob/main/domainbed/scripts/download.py#L304

yuu-Wang commented 3 months ago

Hello, I have already downloaded the dataset through this link, and the path is /root/wangxy/AlignClip/DomainBed/domainbed/data/camelyon17_v1.0/. However, I keep getting an error that says camelyon17 cannot be found. I'm not sure if my file naming is correct, but I successfully ran other datasets like PACS and Officehome. Do I need to handle the camelyon dataset separately?

piotr-teterwak commented 3 months ago

Could you post here the command you use to run main.py, and the directory you run it from? To me it looks like args.root is set to DomainBed/domainbed/data/ instead of '/root/wangxy/AlignClip/DomainBed/domainbed/data/'.

yuu-Wang commented 3 months ago

main.zip This is the main file, and these are the parameters I need to run: DomainBed/domainbed/data/ -d WILDSCamelyon --task domain_shift --targets 0 -b 36 --lr 5e-6 --epochs 10 --beta 0.5. This is the location where I placed my dataset. 1

piotr-teterwak commented 3 months ago

Can you run with /root/wangxy/AlignClip/DomainBed/domainbed/data/ -d WILDSCamelyon --task domain_shift --targets 0 -b 36 --lr 5e-6 --epochs 10 --beta 0.5 instead? See the data path in the first parameter.

yuu-Wang commented 3 months ago

Hello, it's running now, but there's a problem. The camelyon17_v1.0 dataset contains raw dataset patches, and the patches are classified in the format patient_00X_node_X, which is necessary for it to work. However, I have already divided the dataset into hospital0, hospital1, hospital2, hospital3, hospital4, and now it's giving an error. Could you please explain why this is happening?

piotr-teterwak commented 3 months ago

Could you post the stack trace so I can have more information?

yuu-Wang commented 3 months ago

1 Hello, why is the test dataset empty here? I reclassified the camelyon17 dataset and regenerated the metadata.csv file. Do I need to create separate CSV files for test, validation, and training sets? 2

piotr-teterwak commented 3 months ago

Hi @yuu-Wang ,

It's pretty hard to understand what exactly is going on here without more details. In order to help you, I will need a minimal reproducable example including:

  1. How you download the data.
  2. How you generate the new metadata.csv file
  3. A few lines of code, of how you load the data.

However, taking a quick look, I don't think the images need to be split into directories based on their hpspital source.

yuu-Wang commented 3 months ago

hi, 1.The dataset I downloaded is from line 304 of download.py. (https://github.com/facebookresearch/DomainBed/blob/dad3ca34803aa6dc62dfebe9ccfb57452f0bb821/domainbed/scripts/download.py#L304) 2.Since I noticed that the WILD environment in the file(https://github.com/facebookresearch/DomainBed/blob/dad3ca34803aa6dc62dfebe9ccfb57452f0bb821/domainbed/datasets.py#L348) requires four domains (hospital0, hospital1, hospital2, hospital3), I used this code(https://github.com/jameszhou-gl/gpt-4v-distribution-shift/blob/ccfcf00851ccd8867de7c6d92591eaedd8a66d0d/data/process_wilds.py#L21) to divide the downloaded dataset into these four domains. 3.I used this (https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/engine.py#L53) (https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/converter_domainbed.py#L18) to divide the dataset into training, validation, and test sets.

piotr-teterwak commented 3 months ago

Steps 2 and 3 are not needed; the code takes care of this internally. Could you re-download with step 1, skip steps 2 and 3, and try again? If this does not work, could you please send a few lines of code of how exactly you are loading the dataset in your training code?

yuu-Wang commented 3 months ago

Sure, I downloaded it directly according to step one and then ran main.py (https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/main.py#L104). Is it only this dataset that exceeds the length? 微信图片_20240729145723

piotr-teterwak commented 3 months ago

I see that you're running main.py from another repository, CLIPood, and not the DomainBed repository. I'm not very familiar with the CLIPOod code. Could you run from an unmodified DomainBed codebase?

yuu-Wang commented 3 months ago

Thank you very much for your patience. It's doable.