Open IshiKura-a opened 1 year ago
I tried to change the test dataset by some naive changes in the code.
In function prepare_data_Anchor()
in prepare_data.py
, I simply append this before return dataloaders
if test_domain == target:
print('!!!!!! Change dataset of target domain and test')
dataset = dataloaders['clustering_' + target].dataset
n = len(dataset)
idx_perm = torch.randperm(n)
test_idx = idx_perm[:int(0.1 * n)]
train_idx = idx_perm[int(0.1 * n):]
train_data = Subset(dataset, train_idx)
setattr(train_data.dataset, 'l', len(train_idx))
test_data = Subset(dataloaders['test'].dataset, test_idx)
setattr(test_data.dataset, 'l', len(test_idx))
dataloaders['clustering_' + target].dataset = train_data
loader = dataloaders['clustering_' + target].dataloader
dataloaders['clustering_' + target].dataloader = DataLoader(train_data, batch_sampler=loader.batch_sampler,
num_workers=loader.num_workers)
dataloaders['test'].dataset = test_data
loader = dataloaders['test'].dataloader
dataloaders['test'].dataloader = DataLoader(test_data, batch_sampler=loader.batch_sampler,
num_workers=loader.num_workers)
And in single_dataset.py
I rewrite __len__()
of BaseDataset
and BaseDatasetWithoutLabel
(otherwise there would be an IndexError):
def __len__(self):
try:
return getattr(self, 'l')
except AttributeError:
return len(self.data_paths)
On the Office-31 Dataset, I set the loop to be 50
rather than 1000
, and get the results:
And your results would be:
There's a 3% performance degradation.
In prepare_data, we have train dataloader
and test loader
In clustering, this work use
dataloaders['clustering_' + target]
directly. Therefore, ifcfg.TEST.DOMAIN
is unset, this two dataloaders will have the same dataset, which means applying test on train data. Is this a bug or I have just made a mistake ?