I am currently trying to train FastREID on a custom dataset. I am able to run the train file and it begins running fine. However, after the model prints to the console it stopes there. No error message just continues to run forever but does not advance.
Instructions To Reproduce the Issue:
Inside the path fast-reid/fastreid/data/datasets I created file named "fastreid_prototype_1.py" The code for that file is below.
fastreid_prototype_1.py
```
import glob
import os
import os.path as osp
import re
import warnings
from .bases import ImageDataset
from ..datasets import DATASET_REGISTRY
@DATASET_REGISTRY.register()
class FastREID_Prototype_1(ImageDataset):
dataset_dir = ''
dataset_name = "FastREID_Prototype_1"
def __init__(self, root='datasets', **kwargs):
self.root = root
self.dataset_dir = osp.join(self.root, self.dataset_dir)
# allow alternative directory structure
self.data_dir = self.dataset_dir
data_dir = osp.join(self.data_dir, 'FastREID_Prototype_1')
if osp.isdir(data_dir):
self.data_dir = data_dir
else:
warnings.warn('The current data structure is deprecated. Please '
'put data folders such as "train" under '
'"FastREID_Prototype_1".')
self.train_dir = osp.join(self.data_dir, 'train')
self.query_dir = osp.join(self.data_dir, 'test')
self.gallery_dir = osp.join(self.data_dir, 'test')
self.extra_gallery_dir = osp.join(self.data_dir, 'train')
self.extra_gallery = False
self.convert_labels = {
'Brahmos_Missile': 1,
'brahmos_missile': 1,
'BrahmosII': 2,
'brahmosII': 2,
'Brahmosii': 2,
'brahmosii': 2
}
required_files = [
self.data_dir,
self.train_dir,
self.query_dir,
self.gallery_dir,
]
self.check_before_run(required_files)
if self.extra_gallery:
required_files.append(self.extra_gallery_dir)
self.check_before_run(required_files)
train = lambda: self.process_dir(self.train_dir)
query = lambda: self.process_dir(self.query_dir, is_train=False)
gallery = lambda: self.process_dir(self.gallery_dir, is_train=False) + \
(self.process_dir(self.extra_gallery_dir, is_train=False) if self.extra_gallery else [])
super(FastREID_Prototype_1, self).__init__(train, query, gallery, **kwargs)
def process_dir(self, dir_path, is_train=True):
data = []
absolute_path = os.path.join(dir_path)
sub_1_dirs = os.listdir(absolute_path)
for sub_1_dir in sub_1_dirs:
sub_1_path = os.path.join(absolute_path, sub_1_dir)
if sub_1_dir == '.DS_Store':
continue
filenames = os.listdir(sub_1_path)
for filename in filenames:
if filename == '.DS_Store':
continue
filepath = os.path.join(sub_1_path, filename)
data.append((filepath, self.convert_labels[sub_1_dir], 1))
return data
Then, inside tools/train_net.py I added the line "from fastreid.data.datasets.fastreid_prototype_1 import FastREID_Prototype_1."
Then, inside configs I made a new folder named "FastREID_Prototype_1" where I put all the config files with the correct strings changed to "FastREID_Prototype_1".
[01/12 10:52:41 fastreid.data.build]: Using training sampler NaiveIdentitySampler
[01/12 10:52:41 fastreid.engine.defaults]: Auto-scaling the num_classes=2
[01/12 10:52:42 fastreid.modeling.backbones.resnet]: Loading pretrained model from /home/cipoll17/.cache/torch/checkpoints/resnet50-19c8e357.pth
[01/12 10:52:42 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model:
fc.{weight, bias}
The expected behavior is to continue training. This seems to get stuck somewhere and I cannot figure out why. Any feedback, suggestions, or solutions are appreciated!
I am currently trying to train FastREID on a custom dataset. I am able to run the train file and it begins running fine. However, after the model prints to the console it stopes there. No error message just continues to run forever but does not advance.
Instructions To Reproduce the Issue:
fastreid_prototype_1.py
``` import glob import os import os.path as osp import re import warningsfrom .bases import ImageDataset from ..datasets import DATASET_REGISTRY
@DATASET_REGISTRY.register() class FastREID_Prototype_1(ImageDataset): dataset_dir = '' dataset_name = "FastREID_Prototype_1"
Then, inside tools/train_net.py I added the line "from fastreid.data.datasets.fastreid_prototype_1 import FastREID_Prototype_1." Then, inside configs I made a new folder named "FastREID_Prototype_1" where I put all the config files with the correct strings changed to "FastREID_Prototype_1".
bagtricks_R50.yml
_BASE_: ../Base-bagtricks.yml DATASETS: NAMES: ("FastREID_Prototype_1",) TESTS: ("FastREID_Prototype_1",) OUTPUT_DIR: logs/FastREID_Prototype_1/bagtricks_R50Lastly, I put my dataset into "datasets". It has a file structure of datasets/FastREID_Prototype_1
In each child folder, are some images
I run the command:
The full console output I observed: log.txt
Full Console Logs
Command Line Args: Namespace(config_file='./configs/FastREID_Prototype_1/bagtricks_R50.yml', dist_url='tcp://127.0.0.1:50184', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:2'], resume=False) [01/12 10:52:41 fastreid]: Rank of current process: 0. World size: 1 [01/12 10:52:41 fastreid]: Environment info: ---------------------- -------------------------------------------------------------------------------------------- sys.platform linux Python 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] numpy 1.24.1 fastreid 1.3 @/home/cipoll17/fast-reid/./fastreid FASTREID_ENV_MODULE[01/12 10:52:41 fastreid]: Command line arguments: Namespace(config_file='./configs/FastREID_Prototype_1/bagtricks_R50.yml', dist_url='tcp://127.0.0.1:50184', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:2'], resume=False) [01/12 10:52:41 fastreid]: Contents of args.config_file=./configs/FastREID_Prototype_1/bagtricks_R50.yml: BASE: ../Base-bagtricks.yml
DATASETS: NAMES: ("FastREID_Prototype_1",) TESTS: ("FastREID_Prototype_1",)
OUTPUT_DIR: logs/FastREID_Prototype_1/bagtricks_R50
[01/12 10:52:41 fastreid.data.build]: Using training sampler NaiveIdentitySampler [01/12 10:52:41 fastreid.engine.defaults]: Auto-scaling the num_classes=2 [01/12 10:52:42 fastreid.modeling.backbones.resnet]: Loading pretrained model from /home/cipoll17/.cache/torch/checkpoints/resnet50-19c8e357.pth [01/12 10:52:42 fastreid.modeling.backbones.resnet]: The checkpoint state_dict contains keys that are not used by the model: fc.{weight, bias}
Baseline( (backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (se): Identity() ) ) ) (heads): EmbeddingHead( (pool_layer): GlobalAvgPool(output_size=1) (bottleneck): Sequential( (0): BatchNorm(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (cls_layer): Linear(num_classes=2, scale=1, margin=0.0) ) )
Expected behavior:
The expected behavior is to continue training. This seems to get stuck somewhere and I cannot figure out why. Any feedback, suggestions, or solutions are appreciated!