JDAI-CV / fast-reid

SOTA Re-identification Methods and Toolbox
Apache License 2.0
3.39k stars 830 forks source link

Console freezes/halts right before training begins. #719

Closed Cippppy closed 8 months ago

Cippppy commented 8 months ago

I am currently trying to train FastREID on a custom dataset. I am able to run the train file and it begins running fine. However, after the model prints to the console it stopes there. No error message just continues to run forever but does not advance.

Instructions To Reproduce the Issue:

  1. Inside the path fast-reid/fastreid/data/datasets I created file named "fastreid_prototype_1.py" The code for that file is below.
    fastreid_prototype_1.py ``` import glob import os import os.path as osp import re import warnings

from .bases import ImageDataset from ..datasets import DATASET_REGISTRY

@DATASET_REGISTRY.register() class FastREID_Prototype_1(ImageDataset): dataset_dir = '' dataset_name = "FastREID_Prototype_1"

def __init__(self, root='datasets', **kwargs):
    self.root = root
    self.dataset_dir = osp.join(self.root, self.dataset_dir)

    # allow alternative directory structure
    self.data_dir = self.dataset_dir
    data_dir = osp.join(self.data_dir, 'FastREID_Prototype_1')
    if osp.isdir(data_dir):
        self.data_dir = data_dir
    else:
        warnings.warn('The current data structure is deprecated. Please '
                      'put data folders such as "train" under '
                      '"FastREID_Prototype_1".')

    self.train_dir = osp.join(self.data_dir, 'train')
    self.query_dir = osp.join(self.data_dir, 'test')
    self.gallery_dir = osp.join(self.data_dir, 'test')
    self.extra_gallery_dir = osp.join(self.data_dir, 'train')
    self.extra_gallery = False

    self.convert_labels = {
        'Brahmos_Missile': 1,
        'brahmos_missile': 1,
        'BrahmosII': 2,
        'brahmosII': 2,
        'Brahmosii': 2,
        'brahmosii': 2
    }

    required_files = [
        self.data_dir,
        self.train_dir,
        self.query_dir,
        self.gallery_dir,
    ]

    self.check_before_run(required_files)
    if self.extra_gallery:
        required_files.append(self.extra_gallery_dir)
    self.check_before_run(required_files)

    train = lambda: self.process_dir(self.train_dir)
    query = lambda: self.process_dir(self.query_dir, is_train=False)
    gallery = lambda: self.process_dir(self.gallery_dir, is_train=False) + \
                      (self.process_dir(self.extra_gallery_dir, is_train=False) if self.extra_gallery else [])
    super(FastREID_Prototype_1, self).__init__(train, query, gallery, **kwargs)

def process_dir(self, dir_path, is_train=True):
    data = []
    absolute_path = os.path.join(dir_path)
    sub_1_dirs = os.listdir(absolute_path)
    for sub_1_dir in sub_1_dirs:
        sub_1_path = os.path.join(absolute_path, sub_1_dir)
        if sub_1_dir == '.DS_Store':
            continue
        filenames = os.listdir(sub_1_path)
        for filename in filenames:
            if filename == '.DS_Store':
                continue
            filepath = os.path.join(sub_1_path, filename)
            data.append((filepath, self.convert_labels[sub_1_dir], 1))
    return data

Then, inside tools/train_net.py I added the line "from fastreid.data.datasets.fastreid_prototype_1 import FastREID_Prototype_1." Then, inside configs I made a new folder named "FastREID_Prototype_1" where I put all the config files with the correct strings changed to "FastREID_Prototype_1".

bagtricks_R50.yml _BASE_: ../Base-bagtricks.yml DATASETS: NAMES: ("FastREID_Prototype_1",) TESTS: ("FastREID_Prototype_1",) OUTPUT_DIR: logs/FastREID_Prototype_1/bagtricks_R50

Lastly, I put my dataset into "datasets". It has a file structure of datasets/FastREID_Prototype_1

├── train
│   ├── Class1
│   ├── Class2
└── test
    ├── Class1
    ├── Class2

In each child folder, are some images

  1. I run the command:

    python3 tools/train_net.py --config-file ./configs/FastREID_Prototype_1/bagtricks_R50.yml MODEL.DEVICE "cuda:0"
  2. The full console output I observed: log.txt

    Full Console Logs Command Line Args: Namespace(config_file='./configs/FastREID_Prototype_1/bagtricks_R50.yml', dist_url='tcp://127.0.0.1:50184', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:2'], resume=False) [01/12 10:52:41 fastreid]: Rank of current process: 0. World size: 1 [01/12 10:52:41 fastreid]: Environment info: ---------------------- -------------------------------------------------------------------------------------------- sys.platform linux Python 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] numpy 1.24.1 fastreid 1.3 @/home/cipoll17/fast-reid/./fastreid FASTREID_ENV_MODULE PyTorch 2.1.2+cu118 @/home/cipoll17/fast-reid/envs/fastreid/lib/python3.8/site-packages/torch PyTorch debug build False GPU available True GPU 0,1,2,3,4,5,6 Quadro RTX 8000 CUDA_HOME /usr Pillow 9.3.0 torchvision 0.16.2+cu118 @/home/cipoll17/fast-reid/envs/fastreid/lib/python3.8/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75, sm_80, sm_86 cv2 4.9.0 ---------------------- -------------------------------------------------------------------------------------------- PyTorch built with: - GCC 9.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX512 - CUDA Runtime 11.8 - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90 - CuDNN 8.7 - Magma 2.6.1 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

[01/12 10:52:41 fastreid]: Command line arguments: Namespace(config_file='./configs/FastREID_Prototype_1/bagtricks_R50.yml', dist_url='tcp://127.0.0.1:50184', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:2'], resume=False) [01/12 10:52:41 fastreid]: Contents of args.config_file=./configs/FastREID_Prototype_1/bagtricks_R50.yml: BASE: ../Base-bagtricks.yml

DATASETS: NAMES: ("FastREID_Prototype_1",) TESTS: ("FastREID_Prototype_1",)

OUTPUT_DIR: logs/FastREID_Prototype_1/bagtricks_R50

[01/12 10:52:41 fastreid]: Running with full config: CUDNN_BENCHMARK: True DATALOADER: NUM_INSTANCE: 4 NUM_WORKERS: 8 SAMPLER_TRAIN: NaiveIdentitySampler SET_WEIGHT: [] DATASETS: COMBINEALL: False NAMES: ('FastREID_Prototype_1',) TESTS: ('FastREID_Prototype_1',) INPUT: AFFINE: ENABLED: False AUGMIX: ENABLED: False PROB: 0.0 AUTOAUG: ENABLED: False PROB: 0.0 CJ: BRIGHTNESS: 0.15 CONTRAST: 0.15 ENABLED: False HUE: 0.1 PROB: 0.5 SATURATION: 0.1 CROP: ENABLED: False RATIO: [0.75, 1.3333333333333333] SCALE: [0.16, 1] SIZE: [224, 224] FLIP: ENABLED: True PROB: 0.5 PADDING: ENABLED: True MODE: constant SIZE: 10 REA: ENABLED: True PROB: 0.5 VALUE: [123.675, 116.28, 103.53] RPT: ENABLED: False PROB: 0.5 SIZE_TEST: [256, 128] SIZE_TRAIN: [256, 128] KD: EMA: ENABLED: False MOMENTUM: 0.999 MODEL_CONFIG: [] MODEL_WEIGHTS: [] MODEL: BACKBONE: ATT_DROP_RATE: 0.0 DEPTH: 50x DROP_PATH_RATIO: 0.1 DROP_RATIO: 0.0 FEAT_DIM: 2048 LAST_STRIDE: 1 NAME: build_resnet_backbone NORM: BN PRETRAIN: True PRETRAIN_PATH: SIE_COE: 3.0 STRIDE_SIZE: (16, 16) WITH_IBN: False WITH_NL: False WITH_SE: False DEVICE: cuda:2 FREEZE_LAYERS: [] HEADS: CLS_LAYER: Linear EMBEDDING_DIM: 0 MARGIN: 0.0 NAME: EmbeddingHead NECK_FEAT: before NORM: BN NUM_CLASSES: 0 POOL_LAYER: GlobalAvgPool SCALE: 1 WITH_BNNECK: True LOSSES: CE: ALPHA: 0.2 EPSILON: 0.1 SCALE: 1.0 CIRCLE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 COSFACE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 FL: ALPHA: 0.25 GAMMA: 2 SCALE: 1.0 NAME: ('CrossEntropyLoss', 'TripletLoss') TRI: HARD_MINING: True MARGIN: 0.3 NORM_FEAT: False SCALE: 1.0 META_ARCHITECTURE: Baseline PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.120000000000005, 57.375] QUEUE_SIZE: 8192 WEIGHTS: OUTPUT_DIR: logs/FastREID_Prototype_1/bagtricks_R50 SOLVER: AMP: ENABLED: True BASE_LR: 0.00035 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 30 CLIP_GRADIENTS: CLIP_TYPE: norm CLIP_VALUE: 5.0 ENABLED: False NORM_TYPE: 2.0 DELAY_EPOCHS: 0 ETA_MIN_LR: 1e-07 FREEZE_ITERS: 0 GAMMA: 0.1 HEADS_LR_FACTOR: 1.0 IMS_PER_BATCH: 64 MAX_EPOCH: 120 MOMENTUM: 0.9 NESTEROV: False OPT: Adam SCHED: MultiStepLR STEPS: [40, 90] WARMUP_FACTOR: 0.1 WARMUP_ITERS: 2000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0.0005 WEIGHT_DECAY_NORM: 0.0005 TEST: AQE: ALPHA: 3.0 ENABLED: False QE_K: 5 QE_TIME: 1 EVAL_PERIOD: 30 FLIP: ENABLED: False IMS_PER_BATCH: 128 METRIC: cosine PRECISE_BN: DATASET: Market1501 ENABLED: False NUM_ITER: 300 RERANK: ENABLED: False K1: 20 K2: 6 LAMBDA: 0.3 ROC: ENABLED: False [01/12 10:52:41 fastreid]: Full config saved to /home/cipoll17/fast-reid/logs/FastREID_Prototype_1/bagtricks_R50/config.yaml [01/12 10:52:41 fastreid.utils.env]: Using a generated random seed 43438209 [01/12 10:52:41 fastreid.engine.defaults]: Prepare training set [01/12 10:52:41 fastreid.data.datasets.bases]: => Loaded FastREID_Prototype_1 in csv format: subset # ids # images # cameras
train 2 17 1

[01/12 10:52:41 fastreid.data.build]: Using training sampler NaiveIdentitySampler [01/12 10:52:41 fastreid.engine.defaults]: Auto-scaling the num_classes=2 [01/12 10:52:42 fastreid.modeling.backbones.resnet]: Loading pretrained model from /home/cipoll17/.cache/torch/checkpoints/resnet50-19c8e357.pth [01/12 10:52:42 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model: fc.{weight, bias}

Baseline( (backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) ) (heads): EmbeddingHead( (pool_layer): GlobalAvgPool(output_size=1) (bottleneck): Sequential( (0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (cls_layer): Linear(num_classes=2, scale=1, margin=0.0) ) )

Expected behavior:

The expected behavior is to continue training. This seems to get stuck somewhere and I cannot figure out why. Any feedback, suggestions, or solutions are appreciated!

Cippppy commented 8 months ago

Found the answer in #588.