Open yuu-Wang opened 4 months ago
Hi, unfortunately I'm an English speaker, but it looks like you're having issues using Camelyon17 because it is not downloaded?
I woudl run this script, with line 304 uncommented to download the dataset: https://github.com/facebookresearch/DomainBed/blob/main/domainbed/scripts/download.py#L304
Hello, I have already downloaded the dataset through this link, and the path is /root/wangxy/AlignClip/DomainBed/domainbed/data/camelyon17_v1.0/. However, I keep getting an error that says camelyon17 cannot be found. I'm not sure if my file naming is correct, but I successfully ran other datasets like PACS and Officehome. Do I need to handle the camelyon dataset separately?
Could you post here the command you use to run main.py, and the directory you run it from? To me it looks like args.root is set to DomainBed/domainbed/data/
instead of '/root/wangxy/AlignClip/DomainBed/domainbed/data/'.
main.zip This is the main file, and these are the parameters I need to run: DomainBed/domainbed/data/ -d WILDSCamelyon --task domain_shift --targets 0 -b 36 --lr 5e-6 --epochs 10 --beta 0.5. This is the location where I placed my dataset.
Can you run with /root/wangxy/AlignClip/DomainBed/domainbed/data/ -d WILDSCamelyon --task domain_shift --targets 0 -b 36 --lr 5e-6 --epochs 10 --beta 0.5
instead? See the data path in the first parameter.
Hello, it's running now, but there's a problem. The camelyon17_v1.0 dataset contains raw dataset patches, and the patches are classified in the format patient_00X_node_X, which is necessary for it to work. However, I have already divided the dataset into hospital0, hospital1, hospital2, hospital3, hospital4, and now it's giving an error. Could you please explain why this is happening?
Could you post the stack trace so I can have more information?
Hello, why is the test dataset empty here? I reclassified the camelyon17 dataset and regenerated the metadata.csv file. Do I need to create separate CSV files for test, validation, and training sets?
Hi @yuu-Wang ,
It's pretty hard to understand what exactly is going on here without more details. In order to help you, I will need a minimal reproducable example including:
However, taking a quick look, I don't think the images need to be split into directories based on their hpspital source.
hi,
1.The dataset I downloaded is from line 304 of download.py
.
(https://github.com/facebookresearch/DomainBed/blob/dad3ca34803aa6dc62dfebe9ccfb57452f0bb821/domainbed/scripts/download.py#L304)
2.Since I noticed that the WILD environment in the file(https://github.com/facebookresearch/DomainBed/blob/dad3ca34803aa6dc62dfebe9ccfb57452f0bb821/domainbed/datasets.py#L348) requires four domains (hospital0, hospital1, hospital2, hospital3), I used this code(https://github.com/jameszhou-gl/gpt-4v-distribution-shift/blob/ccfcf00851ccd8867de7c6d92591eaedd8a66d0d/data/process_wilds.py#L21) to divide the downloaded dataset into these four domains.
3.I used this
(https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/engine.py#L53)
(https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/converter_domainbed.py#L18)
to divide the dataset into training, validation, and test sets.
Steps 2 and 3 are not needed; the code takes care of this internally. Could you re-download with step 1, skip steps 2 and 3, and try again? If this does not work, could you please send a few lines of code of how exactly you are loading the dataset in your training code?
Sure, I downloaded it directly according to step one and then ran main.py (https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/main.py#L104). Is it only this dataset that exceeds the length?
I see that you're running main.py from another repository, CLIPood, and not the DomainBed repository. I'm not very familiar with the CLIPOod code. Could you run from an unmodified DomainBed codebase?
Thank you very much for your patience. It's doable.
您好,要是想用Camelyon17 ,该怎么编排数据集的结构,DomainBed/domainbed/data/camelyon17/是这样吗,但是它一直显示没找到。Traceback (most recent call last): File "/root/wangxy/AlignClip/main.py", line 155, in
main(args)
File "/root/wangxy/AlignClip/main.py", line 44, in main
train_iter, val_loader, test_loaders, train_class_names, template = get_dataset(args)
File "/root/wangxy/AlignClip/engine.py", line 59, in get_dataset
converter_domainbed.get_domainbed_datasets(dataset_name=args.data, root=args.root, targets=args.targets,
File "/root/wangxy/AlignClip/converter_domainbed.py", line 21, in get_domainbed_datasets
datasets = vars(dbdatasets)[dataset_name](root, targets, hparams)
File "/root/wangxy/AlignClip/DomainBed/domainbed/datasets.py", line 347, in init
dataset = Camelyon17Dataset(root_dir=root)
File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/camelyon17_dataset.py", line 64, in init
self._data_dir = self.initialize_data_dir(root_dir, download)
File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/wilds_dataset.py", line 341, in initialize_data_dir
self.download_dataset(data_dir, download)
File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/wilds_dataset.py", line 368, in download_dataset
raise FileNotFoundError(
FileNotFoundError: The camelyon17 dataset could not be found in DomainBed/domainbed/data/camelyon17_v1.0. Initialize the dataset with download=True to download the dataset. If you are using the example script, run with --download. This might take some time for large datasets.