Open jeonga0303 opened 5 months ago
how to convert Nc1..?
I tried changing the file configuration in the following order.
I'm training, but the data is big, so I'll let you know the results in the future
download pth file
config.py
_C.DATA.IMG_SIZE = 224
_C.MODEL.PRETRAINED = 'internimage_b_1k_224.pth'
_C.MODEL.NUM_CLASSES = 4
if 'head.bias' in state_dict:
head_bias_pretrained = state_dict['head.bias']
Nc1 = head_bias_pretrained.shape[0]
Nc2 = model.head.bias.shape[0]
logger.info(f'{Nc1}, {Nc2}')
if (Nc1 != Nc2):
# head_weight = model.head.weight
# head_bias = model.head.bias
model.head.weight = torch.nn.Parameter(torch.zeros_like(model.head.weight))
model.head.bias = torch.nn.Parameter(torch.zeros_like(model.head.bias))
state_dict.pop('head.weight', None)
state_dict.pop('head.bias', None)
def __iter__(self):
# deterministically shuffle based on epoch
g = torch.Generator()
g.manual_seed(self.epoch)
t = torch.Generator()
t.manual_seed(0)
indices = torch.randperm(len(self.dataset), generator=t).tolist()
indices = [i for i in indices if i % self.num_parts == self.rank]
# add extra samples to make it evenly divisible
while len(indices) < self.total_size_parts:
indices += indices[:(self.total_size_parts - len(indices))]
indices = indices[:self.total_size_parts]
assert len(indices) == self.total_size_parts, f'Length of indices ({len(indices)}) does not match total_size_parts ({self.total_size_parts})'
# subsample
indices = indices[self.rank // self.num_parts:self.total_size_parts:self.num_replicas // self.num_parts]
index = torch.randperm(len(indices), generator=g).tolist()
indices = list(np.array(indices)[index])
assert len(indices) == self.num_samples, f'Length of indices ({len(indices)}) does not match num_samples ({self.num_samples})'
return iter(indices)
python -m torch.distributed.launch --nproc_per_node 2 --master_port 12345 main.py --cfg configs/without_lr_decay/internimage_b_1k_224_custom.yaml --data-path [data-path] --pretrained internimage_b_1k_224.pth --batch-size 120
my gpu is a100 * 2.
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --cfg configs/without_lr_decay/internimage_b_1k_224_custom.yaml --batch-size 256 --accumulation-steps 4 --pretrained internimage_b_1k_224.pth --data-path [data-path] --local-rank 1 --output work_dirs
2024.06.11 train success (image classification fine-tuning)
[bug]
I don't think there's progress in training.. Everything's the same as loss May I know the reason?
I customized the config.py. how to train fine-tuning classification model?