MingkunLishigure commented 11 months ago

Thanks for your response! However, the current situation is that if I train using the default parameters you suggested, the performance of the University-1652 dataset network will be poor in the Epoch=1 phase when using a 4*3090GPU.

The problem we notice is that the loss does not decrease throughout the training process. The only difference from the default code in training is that we downloaded the ConvNeXt-T model from https://github.com/facebookresearch/ConvNeXt fine-tuned on the ImageNet-1k dataset and load it locally via model_state_dict = torch.load('. /pretrained/university/{}.pth'.format(config.model)) model.load_state_dict(model_state_dict, strict=False)

Originally posted by @MingkunLishigure in https://github.com/Skyy93/Sample4Geo/issues/1#issuecomment-1761810227

Skyy93 commented 11 months ago

This is an interesting behaviour, I tested this configuration: convnext_tiny.in12k_ft_in1k_384 as model_name and loaded the: "convnext_tiny_1k_224_ema.pth" as a checkpoint.

With this result: Recall@1: 90.3579 - Recall@5: 97.2131 - Recall@10: 98.0214 - Recall@top1: 98.1508 - AP: 91.9182

Can you send me the dataclass configuration (class TrainingConfiguration) you used?

Mine was:

@dataclass class TrainingConfiguration:

# Model
model: str = 'convnext_tiny.in12k_ft_in1k_384'

# Override model image size
img_size: int = 384

# Training 
mixed_precision: bool = True
custom_sampling: bool = True         # use custom sampling instead of random
seed = 1
epochs: int = 1
batch_size: int = 128                # keep in mind real_batch_size = 2 * batch_size
verbose: bool = True
gpu_ids: tuple = (0,1,2,3,4,5,6,7)           # GPU ids for training

# Eval
batch_size_eval: int = 128
eval_every_n_epoch: int = 1          # eval every n Epoch
normalize_features: bool = True
eval_gallery_n: int = -1             # -1 for all or int
# Optimizer 
clip_grad = 100.                     # None | float
decay_exclue_bias: bool = False
grad_checkpointing: bool = False     # Gradient Checkpointing

# Loss
label_smoothing: float = 0.1

# Learning Rate
lr: float = 0.001                    # 1 * 10^-4 for ViT | 1 * 10^-1 for CNN
scheduler: str = "cosine"           # "polynomial" | "cosine" | "constant" | None
warmup_epochs: int = 0.1
lr_end: float = 0.0001               #  only for "polynomial"
gradient_accumulation: int = 1

# Dataset
dataset: str = 'U1652-D2S'           # 'U1652-D2S' | 'U1652-S2D'
data_folder: str = "./data/U1652"
single_sample: bool = False

# Augment Images
prob_flip: float = 0.5              # flipping the sat image and drone image simultaneously

# Savepath for model checkpoints
model_path: str = "./university_e40_eval4_384_aug_final"

# Eval before training
zero_shot: bool = False

# Checkpoint to start from
checkpoint_start = "convnext_tiny_1k_224_ema.pth"

# set num_workers to 0 if on Windows
num_workers: int = 0 if os.name == 'nt' else 4 

# train on GPU if available
device: str = 'cuda' if torch.cuda.is_available() else 'cpu' 

# for better performance
cudnn_benchmark: bool = True

# make cudnn deterministic
cudnn_deterministic: bool = False`

MingkunLishigure commented 11 months ago

OK, this is the dataclass configuration we used in training stage:

class Configuration:

    # Model
    # model: str = 'convnext_base.fb_in22k_ft_in1k_384'
    model: str = 'convnext_tiny.fb_in22k_ft_in1k_384'

    # Override model image size
    img_size: int = 384

    # Training 
    mixed_precision: bool = True
    custom_sampling: bool = True         # use custom sampling instead of random
    seed = 1
    epochs: int = 1
    batch_size: int = 128                # keep in mind real_batch_size = 2 * batch_size
    verbose: bool = True
    gpu_ids: tuple = (0,1,2,3)           # GPU ids for training

    # Eval
    batch_size_eval: int = 128
    eval_every_n_epoch: int = 1          # eval every n Epoch
    normalize_features: bool = True
    eval_gallery_n: int = -1             # -1 for all or int

    # Optimizer 
    clip_grad = 100.                     # None | float
    decay_exclue_bias: bool = False
    grad_checkpointing: bool = False     # Gradient Checkpointing

    # Loss
    label_smoothing: float = 0.1

    # Learning Rate
    lr: float = 0.001                    # 1 * 10^-4 for ViT | 1 * 10^-1 for CNN
    scheduler: str = "cosine"           # "polynomial" | "cosine" | "constant" | None
    warmup_epochs: int = 0.1
    lr_end: float = 0.0001               #  only for "polynomial"

    # Dataset
    dataset: str = 'U1652-D2S'           # 'U1652-D2S' | 'U1652-S2D'
    data_folder: str = "/root/datasets/University1652"

    # Augment Images
    prob_flip: float = 0.5              # flipping the sat image and drone image simultaneously

    # Savepath for model checkpoints
    model_path: str = "./university"

    # Eval before training
    zero_shot: bool = False

    # Checkpoint to start from
    checkpoint_start = None

    # set num_workers to 0 if on Windows
    num_workers: int = 0 if os.name == 'nt' else 4 

    # train on GPU if available
    device: str = 'cuda' if torch.cuda.is_available() else 'cpu' 

    # for better performance
    cudnn_benchmark: bool = True

    # make cudnn deterministic
    cudnn_deterministic: bool = False

Skyy93 commented 11 months ago

This looks good, how does the output of the training script looks like?

MingkunLishigure commented 11 months ago

The output of training script:

Model: convnext_tiny.fb_in22k_ft_in1k_384
{'input_size': (3, 384, 384), 'interpolation': 'bicubic', 'mean': (0.485, 0.456, 0.406), 'std': (0.229, 0.224, 0.225), 'crop_pct': 1.0, 'crop_mode': 'squash'}
GPUs available: 4

Image Size Query: (384, 384)
Image Size Ground: (384, 384)
Mean: (0.485, 0.456, 0.406)
Std:  (0.229, 0.224, 0.225)

Query Images Test: 37855
Gallery Images Test: 951

Scheduler: cosine - max LR: 0.001
Warmup Epochs: 0.1 - Warmup Steps: 29.6
Train Epochs:  1 - Train Steps:  296

Shuffle Dataset:
Original Length: 37854 - Length after Shuffle: 37760
Break Counter: 512
Pairs left out of last batch to avoid creating noise: 94
First Element ID: 1094 - Last Element ID: 1508

------------------------------[Epoch: 1]------------------------------
Epoch: 1, Train Loss = 4.487, Lr = 0.000000

------------------------------[Evaluate]------------------------------
Extract Features:
Compute Scores:
Recall@1: 1.7726 - Recall@5: 5.7192 - Recall@10: 9.4439 - Recall@top1: 10.0991 - AP: 3.3876

Shuffle Dataset:
Original Length: 37854 - Length after Shuffle: 37760
Break Counter: 512
Pairs left out of last batch to avoid creating noise: 94
First Element ID: 0945 - Last Element ID: 1646

Skyy93 commented 11 months ago

I tried it on another machine cloned the repo and downloaded the U1652 data. But I am very sorry I can not reproduce the issue.

Model: convnext_tiny.fb_in22k_ft_in1k_384
{'input_size': (3, 384, 384), 'interpolation': 'bicubic', 'mean': (0.485, 0.456, 0.406), 'std': (0.229, 0.224, 0.225), 'crop_pct': 1.0, 'crop_mode': 'squash'}
GPUs available: 1

Image Size Query: (384, 384)
Image Size Ground: (384, 384)
Mean: (0.485, 0.456, 0.406)
Std:  (0.229, 0.224, 0.225)

Query Images Test: 37855
Gallery Images Test: 951

Scheduler: cosine - max LR: 0.001
Warmup Epochs: 0.1 - Warmup Steps: 59.2
Train Epochs:  1 - Train Steps:  592

Shuffle Dataset:
40185it [00:00, 397875.71it/s]
Original Length: 37854 - Length after Shuffle: 37824
Break Counter: 512
Pairs left out of last batch to avoid creating noise: 30
First Element ID: 1094 - Last Element ID: 1144

------------------------------[Epoch: 1]------------------------------
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 591/591 [05:12<00:00,  1.89it/s, loss=0.7923, loss_avg=0.9036, lr=0.000000]
Epoch: 1, Train Loss = 0.904, Lr = 0.000000

------------------------------[Evaluate]------------------------------
Extract Features:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 296/296 [01:15<00:00,  3.92it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00,  3.02it/s]
Compute Scores:
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37855/37855 [00:13<00:00, 2869.80it/s]
Recall@1: 92.4950 - Recall@5: 98.0584 - Recall@10: 98.7822 - Recall@top1: 98.8799 - AP: 93.7657

Shuffle Dataset:
40271it [00:00, 390630.03it/s]
Original Length: 37854 - Length after Shuffle: 37824
Break Counter: 512
Pairs left out of last batch to avoid creating noise: 30
First Element ID: 0945 - Last Element ID: 1119

My output seems fine. Did you change anything else in the code?

MingkunLishigure commented 11 months ago

In the training script, we only change this part


config = Configuration() 

if config.dataset == 'U1652-D2S':
    config.query_folder_train = '/root/data1/datasets/University1652/train/satellite'
    config.gallery_folder_train = '/root/data1/datasets/University1652/train/drone'   
    config.query_folder_test = '/root/data1/datasets/University1652/test/query_drone' 
    config.gallery_folder_test = '/root/data1/datasets/University1652/test/gallery_satellite'    
elif config.dataset == 'U1652-S2D':
    config.query_folder_train = '/root/data1/datasets/University1652/train/satellite'
    config.gallery_folder_train = '/root/data1/datasets/University1652/train/drone'    
    config.query_folder_test = '/root/data1/datasets/University1652/test/query_satellite'
    config.gallery_folder_test = '/root/data1/datasets/University1652/test/gallery_drone'

    model = TimmModel(config.model,
                          pretrained=False,
                          img_size=config.img_size)
    model_state_dict = torch.load('{}.pth'.format(config.model))
    model.load_state_dict(model_state_dict, strict=False)

MingkunLishigure commented 11 months ago

And is the model loading required? We set "None" in training phase. ``

Checkpoint to start from

checkpoint_start = "convnext_tiny_1k_224_ema.pth"

Skyy93 commented 11 months ago

Thank you, I think I found the culprit. When you set pretrained=False it does not download the weights from pytorch-image-models. The issue here is that the implementation of pytorch-image-models and the original ConvNext implementation differs a bit, the 1x1 Convolutions can be implemented with a linear layer or with a Conv2D from PyTorch, thus resulting in issues when loading the weights.

So the solution here would be:

Using the pytorch-image-models weights, setting pretrained=True
Alternative 1: Converting the 1x1 Conv2D Blocks in the Facebook weights to the corresponding parts for pytorch-image-models.
Alternative 2: Instead of creating the model in model.py with timm.create_models() creating the models with the code provided in the original repo, then the loading of the original convnext weights should work fine

Skyy93 / Sample4Geo

Model performance with ConvNext-tiny #2

Checkpoint to start from