Closed kendyChina closed 3 years ago
Never faced this issue, can you post the complete log after running python tools/train_net.py
command, along with the command?
Yes, of course.
This is the deleted log after I run python tools/train_net.py --num-gpus 2 --config-file ./configs/CGSH_train.yaml
[03/18 11:45:32] detectron2 INFO: Rank of current process: 0. World size: 2
[03/18 11:45:32] detectron2 INFO: Environment info:
---------------------- -------------------------------------------------------------------------------------------------------------
sys.platform linux
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) [GCC 7.2.0]
numpy 1.19.5
detectron2 0.2.1 @/home/ma-user/anaconda3/lib/python3.6/site-packages/detectron2-0.2.1-py3.6-linux-x86_64.egg/detectron2
Compiler GCC 5.4
CUDA compiler CUDA 10.1
detectron2 arch flags 7.0
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.8.0 @/home/ma-user/anaconda3/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0,1 Tesla V100-PCIE-32GB (arch=7.0)
CUDA_HOME /usr/local/cuda
Pillow 8.1.2
torchvision 0.9.0 @/home/ma-user/anaconda3/lib/python3.6/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.3.post20210311
cv2 3.4.0
---------------------- -------------------------------------------------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
[03/18 11:45:32] detectron2 INFO: Command line arguments: Namespace(config_file='./configs/CGSH_train.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
[03/18 11:45:32] detectron2 INFO: Running with full config:
CUDNN_BENCHMARK: True
DATALOADER:
ASPECT_RATIO_GROUPING: True
FILTER_EMPTY_ANNOTATIONS: False
NUM_WORKERS: 16
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: ()
PROPOSAL_FILES_TRAIN: ()
TEST: ('cgsh_test',)
TRAIN: ('cgsh_train',)
GLOBAL:
HACK: 1.0
INPUT:
CROP:
ENABLED: False
SIZE: [0.9, 0.9]
TYPE: relative_range
FORMAT: BGR
MASK_FORMAT: polygon
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
MIN_SIZE_TRAIN_SAMPLING: choice
RANDOM_FLIP: horizontal
MODEL:
ANCHOR_GENERATOR:
ANGLES: [[-90, 0, 90]]
ASPECT_RATIOS: [[0.5, 1.0, 2.0]]
NAME: DefaultAnchorGenerator
OFFSET: 0.0
SIZES: [[32, 64, 128, 256, 512]]
BACKBONE:
FREEZE_AT: 2
NAME: build_resnet_backbone
DEVICE: cuda
FPN:
FUSE_TYPE: sum
IN_FEATURES: []
NORM:
OUT_CHANNELS: 256
KEYPOINT_ON: False
LOAD_PROPOSALS: False
MASK_ON: False
META_ARCHITECTURE: GeneralizedRCNN
PANOPTIC_FPN:
COMBINE:
ENABLED: True
INSTANCES_CONFIDENCE_THRESH: 0.5
OVERLAP_THRESH: 0.5
STUFF_AREA_LIMIT: 4096
INSTANCE_LOSS_WEIGHT: 1.0
PIXEL_MEAN: [0, 0, 0]
PIXEL_STD: [1, 1, 1]
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
RESNETS:
DEFORM_MODULATED: False
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE: [False, False, False, False]
DEPTH: 50
NORM: FrozenBN
NUM_GROUPS: 1
OUT_FEATURES: ['res4']
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: True
WIDTH_PER_GROUP: 64
RETINANET:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7']
IOU_LABELS: [0, -1, 1]
IOU_THRESHOLDS: [0.4, 0.5]
NMS_THRESH_TEST: 0.5
NORM:
NUM_CLASSES: 80
NUM_CONVS: 4
PRIOR_PROB: 0.01
SCORE_THRESH_TEST: 0.05
SMOOTH_L1_LOSS_BETA: 0.1
TOPK_CANDIDATES_TEST: 1000
ROI_BOX_CASCADE_HEAD:
BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0))
IOUS: (0.5, 0.6, 0.7)
ROI_BOX_HEAD:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
CLS_AGNOSTIC_BBOX_REG: False
CONV_DIM: 256
FC_DIM: 1024
NAME:
NORM:
NUM_CONV: 0
NUM_FC: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
SMOOTH_L1_BETA: 0.0
TRAIN_ON_PRED_BOXES: False
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
IN_FEATURES: ['res4']
IOU_LABELS: [0, 1]
IOU_THRESHOLDS: [0.5]
NAME: Res5ROIHeads
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 10
POSITIVE_FRACTION: 0.25
PROPOSAL_APPEND_GT: True
SCORE_THRESH_TEST: 0.05
ROI_KEYPOINT_HEAD:
CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512)
LOSS_WEIGHT: 1.0
MIN_KEYPOINTS_PER_IMAGE: 1
NAME: KRCNNConvDeconvUpsampleHead
NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True
NUM_KEYPOINTS: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
ROI_MASK_HEAD:
CLS_AGNOSTIC_MASK: False
CONV_DIM: 256
NAME: MaskRCNNConvUpsampleHead
NORM:
NUM_CONV: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
RPN:
BATCH_SIZE_PER_IMAGE: 256
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
BOUNDARY_THRESH: -1
HEAD_NAME: StandardRPNHead
IN_FEATURES: ['res4']
IOU_LABELS: [0, -1, 1]
IOU_THRESHOLDS: [0.3, 0.7]
LOSS_WEIGHT: 1.0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
PRE_NMS_TOPK_TEST: 6000
PRE_NMS_TOPK_TRAIN: 12000
SMOOTH_L1_BETA: 0.0
SEM_SEG_HEAD:
COMMON_STRIDE: 4
CONVS_DIM: 128
IGNORE_VALUE: 255
IN_FEATURES: ['p2', 'p3', 'p4', 'p5']
LOSS_WEIGHT: 1.0
NAME: SemSegFPNHead
NORM: GN
NUM_CLASSES: 54
WEIGHTS: detectron2/model_zoo/R-50.pkl
OUTPUT_DIR: ./output/1_unk
OWOD:
CLUSTERING:
ITEMS_PER_CLASS: 20
MARGIN: 10.0
MOMENTUM: 0.99
START_ITER: 1000
UPDATE_MU_ITER: 3000
Z_DIMENSION: 128
COMPUTE_ENERGY: False
CUR_INTRODUCED_CLS: 1
ENABLE_CLUSTERING: True
ENABLE_THRESHOLD_AUTOLABEL_UNK: True
ENABLE_UNCERTAINITY_AUTOLABEL_UNK: False
ENERGY_SAVE_PATH: energy
FEATURE_STORE_SAVE_PATH: feature_store
NUM_UNK_PER_IMAGE: 1
PREV_INTRODUCED_CLS: 0
SKIP_TRAINING_WHILE_EVAL: False
TEMPERATURE: 1.5
SEED: -1
SOLVER:
BASE_LR: 0.02
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 5000
CLIP_GRADIENTS:
CLIP_TYPE: value
CLIP_VALUE: 1.0
ENABLED: False
NORM_TYPE: 2.0
GAMMA: 0.1
IMS_PER_BATCH: 16
LR_SCHEDULER_NAME: WarmupMultiStepLR
MAX_ITER: 18000
MOMENTUM: 0.9
NESTEROV: False
REFERENCE_WORLD_SIZE: 0
STEPS: (12000, 16000)
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 1000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0.0001
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: False
FLIP: True
MAX_SIZE: 4000
MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200)
DETECTIONS_PER_IMAGE: 100
EVAL_PERIOD: 0
EXPECTED_RESULTS: []
KEYPOINT_OKS_SIGMAS: []
PRECISE_BN:
ENABLED: False
NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
[03/18 11:45:32] detectron2 INFO: Full config saved to ./output/1_unk/config.yaml
[03/18 11:45:32] d2.utils.env INFO: Using a generated random seed 33050046
[03/18 11:45:33] d2.modeling.roi_heads.fast_rcnn INFO: Invalid class range: [1, 2, 3, 4, 5, 6, 7, 8]
[03/18 11:45:33] d2.modeling.roi_heads.fast_rcnn INFO: Feature store not found in ./output/1_unk/feature_store/feat.pt. Creating new feature store.
[03/18 11:45:33] d2.engine.defaults INFO: Model:
GeneralizedRCNN(
(backbone): ResNet(
(box_predictor): FastRCNNOutputLayers(
(cls_score): Linear(in_features=2048, out_features=11, bias=True)
(bbox_pred): Linear(in_features=2048, out_features=40, bias=True)
(hingeloss): HingeEmbeddingLoss()
)
)
)
[03/18 11:45:34] d2.data.build INFO: Valid classes: range(0, 1)
[03/18 11:45:34] d2.data.build INFO: Removing earlier seen class objects and the unknown objects...
[03/18 11:45:34] d2.data.build INFO: Distribution of instances among all 2 categories:
[36m| category | #instances | category | #instances |
|:----------:|:-------------|:----------:|:-------------|
| hawker | 9810 | unknown | 0 |
| | | | |
| total | 9810 | | |[0m
[03/18 11:45:34] d2.data.build INFO: Number of datapoints: 8704
[03/18 11:45:34] d2.data.common INFO: Serializing 8704 elements to byte tensors and concatenating them all ...
[03/18 11:45:34] d2.data.common INFO: Serialized dataset takes 3.65 MiB
[03/18 11:45:34] d2.data.dataset_mapper INFO: Augmentations used in training: [ResizeShortestEdge(short_edge_length=(480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[03/18 11:45:34] d2.data.build INFO: Using training sampler TrainingSampler
[03/18 11:45:34] fvcore.common.checkpoint INFO: Loading checkpoint from detectron2/model_zoo/R-50.pkl
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: Remapping C2 weights ......
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: backbone.res2.0.conv1.norm.bias loaded from res2_0_branch2a_bn_beta of shape (64,)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: backbone.res2.0.conv1.norm.running_mean loaded from res2_0_branch2a_bn_running_mean of shape (64,)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: backbone.res2.0.conv1.norm.running_var loaded from res2_0_branch2a_bn_running_var of shape (64,)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: backbone.res2.0.conv1.norm.weight loaded from res2_0_branch2a_bn_gamma of shape (64,)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: roi_heads.res5.2.conv3.norm.running_mean loaded from res5_2_branch2c_bn_running_mean of shape (2048,)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: roi_heads.res5.2.conv3.norm.running_var loaded from res5_2_branch2c_bn_running_var of shape (2048,)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: roi_heads.res5.2.conv3.norm.weight loaded from res5_2_branch2c_bn_gamma of shape (2048,)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: roi_heads.res5.2.conv3.weight loaded from res5_2_branch2c_w of shape (2048, 512, 1, 1)
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: Some model parameters or buffers are not found in the checkpoint:
[34mpixel_mean[0m
[34mpixel_std[0m
[34mproposal_generator.anchor_generator.cell_anchors.0[0m
[34mproposal_generator.rpn_head.anchor_deltas.{bias, weight}[0m
[34mproposal_generator.rpn_head.conv.{bias, weight}[0m
[34mproposal_generator.rpn_head.objectness_logits.{bias, weight}[0m
[34mroi_heads.box_predictor.bbox_pred.{bias, weight}[0m
[34mroi_heads.box_predictor.cls_score.{bias, weight}[0m
[03/18 11:45:35] d2.checkpoint.c2_model_loading INFO: The checkpoint state_dict contains keys that are not used by the model:
[35mfc1000_b[0m
[35mfc1000_w[0m
[35mconv1_b[0m
[03/18 11:45:35] d2.engine.train_loop INFO: Starting training from iteration 0
[03/18 11:47:10] d2.utils.events INFO: eta: 11:23:51 iter: 19 total_loss: 1.29 loss_cls: 0.2751 loss_box_reg: 0.2749 loss_clustering: 0 loss_rpn_cls: 0.6791 loss_rpn_loc: 0.04721 time: 3.1832 data_time: 1.2863 lr: 0.00039962 max_mem: 19392M
[03/18 23:24:48] d2.utils.events INFO: eta: 0:00:46 iter: 17979 total_loss: 0.1125 loss_cls: 0.02595 loss_box_reg: 0.05373 loss_clustering: 0.01532 loss_rpn_cls: 0.004229 loss_rpn_loc: 0.01234 time: 2.3307 data_time: 0.0148 lr: 0.0002 max_mem: 25820M
[03/18 23:25:34] d2.modeling.roi_heads.fast_rcnn INFO: Saving image store at iteration 17999 to ./output/1_unk/feature_store/feat.pt
[03/18 23:25:35] fvcore.common.checkpoint INFO: Saving checkpoint to ./output/1_unk/model_final.pth
[03/18 23:25:36] d2.utils.events INFO: eta: 0:00:00 iter: 17999 total_loss: 0.1131 loss_cls: 0.02487 loss_box_reg: 0.05478 loss_clustering: 0.01534 loss_rpn_cls: 0.006022 loss_rpn_loc: 0.01257 time: 2.3307 data_time: 0.0153 lr: 0.0002 max_mem: 25820M
[03/18 23:25:36] d2.engine.hooks INFO: Overall training speed: 17998 iterations in 11:39:08 (2.3307 s / it)
[03/18 23:25:36] d2.engine.hooks INFO: Total training time: 11:39:23 (0:00:15 on hooks)
[03/18 23:25:36] d2.data.build INFO: Known classes: range(0, 1)
[03/18 23:25:36] d2.data.build INFO: Labelling known instances the corresponding label, and unknown instances as unknown...
[03/18 23:25:36] d2.data.build INFO: Distribution of instances among all 2 categories:
[36m| category | #instances | category | #instances |
|:----------:|:-------------|:----------:|:-------------|
| hawker | 1624 | unknown | 0 |
| | | | |
| total | 1624 | | |[0m
[03/18 23:25:36] d2.data.build INFO: Number of datapoints: 1440
[03/18 23:25:36] d2.data.common INFO: Serializing 1440 elements to byte tensors and concatenating them all ...
[03/18 23:25:36] d2.data.common INFO: Serialized dataset takes 0.61 MiB
[03/18 23:25:36] d2.data.dataset_mapper INFO: Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[03/18 23:25:36] d2.evaluation.pascal_voc_evaluation INFO: Energy distribution is not found at ./output/1_unk/energy_dist_1.pkl
[03/18 23:25:36] d2.evaluation.evaluator INFO: Start inference on 720 images
My CGSH_train.yaml
is:
CUDNN_BENCHMARK: True
OUTPUT_DIR: "./output/1_unk"
MODEL:
PIXEL_MEAN: [0, 0, 0]
PIXEL_STD: [1, 1, 1]
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHTS: "detectron2/model_zoo/R-50.pkl"
RPN:
PRE_NMS_TOPK_TEST: 6000
POST_NMS_TOPK_TEST: 1000
ROI_HEADS:
NUM_CLASSES: 10
NAME: "Res5ROIHeads"
MASK_ON: False
RESNETS:
DEPTH: 50
INPUT:
MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
MIN_SIZE_TEST: 800
DATASETS:
TRAIN: ("cgsh_train",)
TEST: ("cgsh_test",)
DATALOADER:
FILTER_EMPTY_ANNOTATIONS: False
NUM_WORKERS: 16
SOLVER:
# gpus 2
IMS_PER_BATCH: 16
# 0.02 * bs / 16
BASE_LR: 0.02
STEPS: (12000, 16000)
GAMMA: 0.1
MAX_ITER: 18000
LR_SCHEDULER_NAME: WarmupMultiStepLR
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 1000
WARMUP_METHOD: linear
TEST:
EXPECTED_RESULTS: []
VERSION: 2
OWOD:
ENABLE_THRESHOLD_AUTOLABEL_UNK: True
NUM_UNK_PER_IMAGE: 1
ENABLE_UNCERTAINITY_AUTOLABEL_UNK: False
ENABLE_CLUSTERING: True
FEATURE_STORE_SAVE_PATH: 'feature_store'
SKIP_TRAINING_WHILE_EVAL: False
PREV_INTRODUCED_CLS: 0
CUR_INTRODUCED_CLS: 1
COMPUTE_ENERGY: False
ENERGY_SAVE_PATH: 'energy'
SKIP_TRAINING_WHILE_EVAL: False
TEMPERATURE: 1.5
CLUSTERING:
ITEMS_PER_CLASS: 20
START_ITER: 1000
UPDATE_MU_ITER: 3000
MOMENTUM: 0.99
Z_DIMENSION: 128
And this is the simple log after I run python tools/train_net.py --num-gpus 2 --config-file ./configs/CGSH_val.yaml
[03/19 08:59:19] detectron2 INFO: Rank of current process: 0. World size: 2
[03/19 08:59:20] detectron2 INFO: Environment info:
---------------------- -------------------------------------------------------------------------------------------------------------
sys.platform linux
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) [GCC 7.2.0]
numpy 1.19.5
detectron2 0.2.1 @/home/ma-user/anaconda3/lib/python3.6/site-packages/detectron2-0.2.1-py3.6-linux-x86_64.egg/detectron2
Compiler GCC 5.4
CUDA compiler CUDA 10.1
detectron2 arch flags 7.0
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.8.0 @/home/ma-user/anaconda3/lib/python3.6/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0,1 Tesla V100-PCIE-32GB (arch=7.0)
CUDA_HOME /usr/local/cuda
Pillow 8.1.2
torchvision 0.9.0 @/home/ma-user/anaconda3/lib/python3.6/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.3.post20210311
cv2 3.4.0
---------------------- -------------------------------------------------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
[03/19 08:59:20] detectron2 INFO: Command line arguments: Namespace(config_file='./configs/CGSH_val.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
[03/19 08:59:20] detectron2 INFO: Running with full config:
CUDNN_BENCHMARK: True
DATALOADER:
ASPECT_RATIO_GROUPING: True
FILTER_EMPTY_ANNOTATIONS: False
NUM_WORKERS: 16
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: ()
PROPOSAL_FILES_TRAIN: ()
TEST: ('cgsh_val',)
TRAIN: ('cgsh_val',)
GLOBAL:
HACK: 1.0
INPUT:
CROP:
ENABLED: False
SIZE: [0.9, 0.9]
TYPE: relative_range
FORMAT: BGR
MASK_FORMAT: polygon
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
MIN_SIZE_TRAIN_SAMPLING: choice
RANDOM_FLIP: horizontal
MODEL:
ANCHOR_GENERATOR:
ANGLES: [[-90, 0, 90]]
ASPECT_RATIOS: [[0.5, 1.0, 2.0]]
NAME: DefaultAnchorGenerator
OFFSET: 0.0
SIZES: [[32, 64, 128, 256, 512]]
BACKBONE:
FREEZE_AT: 2
NAME: build_resnet_backbone
DEVICE: cuda
FPN:
FUSE_TYPE: sum
IN_FEATURES: []
NORM:
OUT_CHANNELS: 256
KEYPOINT_ON: False
LOAD_PROPOSALS: False
MASK_ON: False
META_ARCHITECTURE: GeneralizedRCNN
PANOPTIC_FPN:
COMBINE:
ENABLED: True
INSTANCES_CONFIDENCE_THRESH: 0.5
OVERLAP_THRESH: 0.5
STUFF_AREA_LIMIT: 4096
INSTANCE_LOSS_WEIGHT: 1.0
PIXEL_MEAN: [0, 0, 0]
PIXEL_STD: [1, 1, 1]
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
RESNETS:
DEFORM_MODULATED: False
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE: [False, False, False, False]
DEPTH: 50
NORM: FrozenBN
NUM_GROUPS: 1
OUT_FEATURES: ['res4']
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: True
WIDTH_PER_GROUP: 64
RETINANET:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7']
IOU_LABELS: [0, -1, 1]
IOU_THRESHOLDS: [0.4, 0.5]
NMS_THRESH_TEST: 0.5
NORM:
NUM_CLASSES: 80
NUM_CONVS: 4
PRIOR_PROB: 0.01
SCORE_THRESH_TEST: 0.05
SMOOTH_L1_LOSS_BETA: 0.1
TOPK_CANDIDATES_TEST: 1000
ROI_BOX_CASCADE_HEAD:
BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0))
IOUS: (0.5, 0.6, 0.7)
ROI_BOX_HEAD:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
CLS_AGNOSTIC_BBOX_REG: False
CONV_DIM: 256
FC_DIM: 1024
NAME:
NORM:
NUM_CONV: 0
NUM_FC: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
SMOOTH_L1_BETA: 0.0
TRAIN_ON_PRED_BOXES: False
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
IN_FEATURES: ['res4']
IOU_LABELS: [0, 1]
IOU_THRESHOLDS: [0.5]
NAME: Res5ROIHeads
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 10
POSITIVE_FRACTION: 0.25
PROPOSAL_APPEND_GT: True
SCORE_THRESH_TEST: 0.05
ROI_KEYPOINT_HEAD:
CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512)
LOSS_WEIGHT: 1.0
MIN_KEYPOINTS_PER_IMAGE: 1
NAME: KRCNNConvDeconvUpsampleHead
NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True
NUM_KEYPOINTS: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
ROI_MASK_HEAD:
CLS_AGNOSTIC_MASK: False
CONV_DIM: 256
NAME: MaskRCNNConvUpsampleHead
NORM:
NUM_CONV: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
RPN:
BATCH_SIZE_PER_IMAGE: 256
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
BOUNDARY_THRESH: -1
HEAD_NAME: StandardRPNHead
IN_FEATURES: ['res4']
IOU_LABELS: [0, -1, 1]
IOU_THRESHOLDS: [0.3, 0.7]
LOSS_WEIGHT: 1.0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
PRE_NMS_TOPK_TEST: 6000
PRE_NMS_TOPK_TRAIN: 12000
SMOOTH_L1_BETA: 0.0
SEM_SEG_HEAD:
COMMON_STRIDE: 4
CONVS_DIM: 128
IGNORE_VALUE: 255
IN_FEATURES: ['p2', 'p3', 'p4', 'p5']
LOSS_WEIGHT: 1.0
NAME: SemSegFPNHead
NORM: GN
NUM_CLASSES: 54
WEIGHTS: ./output/1_unk/model_final.pth
OUTPUT_DIR: ./output/1_unk_val
OWOD:
CLUSTERING:
ITEMS_PER_CLASS: 20
MARGIN: 10.0
MOMENTUM: 0.99
START_ITER: 1000
UPDATE_MU_ITER: 3000
Z_DIMENSION: 128
COMPUTE_ENERGY: True
CUR_INTRODUCED_CLS: 1
ENABLE_CLUSTERING: False
ENABLE_THRESHOLD_AUTOLABEL_UNK: True
ENABLE_UNCERTAINITY_AUTOLABEL_UNK: False
ENERGY_SAVE_PATH: energy
FEATURE_STORE_SAVE_PATH: feature_store
NUM_UNK_PER_IMAGE: 1
PREV_INTRODUCED_CLS: 0
SKIP_TRAINING_WHILE_EVAL: False
TEMPERATURE: 1.5
SEED: -1
SOLVER:
BASE_LR: 0.02
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 5000
CLIP_GRADIENTS:
CLIP_TYPE: value
CLIP_VALUE: 1.0
ENABLED: False
NORM_TYPE: 2.0
GAMMA: 0.1
IMS_PER_BATCH: 16
LR_SCHEDULER_NAME: WarmupMultiStepLR
MAX_ITER: 700
MOMENTUM: 0.9
NESTEROV: False
REFERENCE_WORLD_SIZE: 0
STEPS: (12000, 16000)
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 0
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0.0001
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: False
FLIP: True
MAX_SIZE: 4000
MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200)
DETECTIONS_PER_IMAGE: 100
EVAL_PERIOD: 0
EXPECTED_RESULTS: []
KEYPOINT_OKS_SIGMAS: []
PRECISE_BN:
ENABLED: False
NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
[03/19 08:59:20] detectron2 INFO: Full config saved to ./output/1_unk_val/config.yaml
[03/19 08:59:20] d2.utils.env INFO: Using a generated random seed 20213114
[03/19 08:59:20] d2.modeling.roi_heads.fast_rcnn INFO: Invalid class range: [1, 2, 3, 4, 5, 6, 7, 8]
[03/19 08:59:20] d2.modeling.roi_heads.fast_rcnn INFO: Feature store not found in ./output/1_unk_val/feature_store/feat.pt. Creating new feature store.
[03/19 08:59:20] d2.engine.defaults INFO: Model:
GeneralizedRCNN(
(backbone): ResNet(
(box_predictor): FastRCNNOutputLayers(
(cls_score): Linear(in_features=2048, out_features=11, bias=True)
(bbox_pred): Linear(in_features=2048, out_features=40, bias=True)
(hingeloss): HingeEmbeddingLoss()
)
)
)
[03/19 08:59:20] d2.data.build INFO: Known classes: range(0, 1)
[03/19 08:59:20] d2.data.build INFO: Labelling known instances the corresponding label, and unknown instances as unknown...
[03/19 08:59:20] d2.data.build INFO: Distribution of instances among all 2 categories:
[36m| category | #instances | category | #instances |
|:----------:|:-------------|:----------:|:-------------|
| hawker | 1324 | unknown | 0 |
| | | | |
| total | 1324 | | |[0m
[03/19 08:59:20] d2.data.build INFO: Number of datapoints: 422
[03/19 08:59:20] d2.data.common INFO: Serializing 422 elements to byte tensors and concatenating them all ...
[03/19 08:59:20] d2.data.common INFO: Serialized dataset takes 0.20 MiB
[03/19 08:59:20] d2.data.dataset_mapper INFO: Augmentations used in training: [ResizeShortestEdge(short_edge_length=(480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[03/19 08:59:20] d2.data.build INFO: Using training sampler TrainingSampler
[03/19 08:59:20] fvcore.common.checkpoint INFO: Loading checkpoint from ./output/1_unk/model_final.pth
[03/19 08:59:21] d2.engine.train_loop INFO: Starting training from iteration 0
[03/19 09:00:57] d2.utils.events INFO: eta: 0:24:44 iter: 19 total_loss: 0.7832 loss_cls: 0.2363 loss_box_reg: 0.3492 loss_clustering: 0 loss_rpn_cls: 0.1057 loss_rpn_loc: 0.05144 time: 3.1343 data_time: 1.2971 lr: 0.02 max_mem: 19360M
[03/19 09:25:13] d2.utils.events INFO: eta: 0:00:43 iter: 679 total_loss: 0.1585 loss_cls: 0.0311 loss_box_reg: 0.0772 loss_clustering: 0 loss_rpn_cls: 0.01072 loss_rpn_loc: 0.03215 time: 2.2289 data_time: 0.0163 lr: 0.02 max_mem: 19360M
[03/19 09:25:56] fvcore.common.checkpoint INFO: Saving checkpoint to ./output/1_unk_val/model_final.pth
[03/19 09:25:57] d2.utils.events INFO: eta: 0:00:00 iter: 699 total_loss: 0.165 loss_cls: 0.03248 loss_box_reg: 0.08219 loss_clustering: 0 loss_rpn_cls: 0.01096 loss_rpn_loc: 0.03482 time: 2.2276 data_time: 0.0149 lr: 0.02 max_mem: 19360M
[03/19 09:25:57] d2.engine.train_loop INFO: Going to analyse the energy files...
[03/19 09:25:57] d2.engine.train_loop INFO: Temperature value: 1.5
[03/19 09:25:57] d2.engine.train_loop INFO: Analysing 0 / 1400
[03/19 09:26:48] d2.engine.train_loop INFO: Analysing 100 / 1400
[03/19 09:27:35] d2.engine.train_loop INFO: Analysing 200 / 1400
[03/19 09:28:17] d2.engine.train_loop INFO: Analysing 300 / 1400
[03/19 09:29:10] d2.engine.train_loop INFO: Analysing 400 / 1400
[03/19 09:29:57] d2.engine.train_loop INFO: Analysing 500 / 1400
[03/19 09:30:49] d2.engine.train_loop INFO: Analysing 600 / 1400
[03/19 09:31:36] d2.engine.train_loop INFO: Analysing 700 / 1400
[03/19 09:32:26] d2.engine.train_loop INFO: Analysing 800 / 1400
[03/19 09:33:15] d2.engine.train_loop INFO: Analysing 900 / 1400
[03/19 09:34:04] d2.engine.train_loop INFO: Analysing 1000 / 1400
[03/19 09:34:58] d2.engine.train_loop INFO: Analysing 1100 / 1400
[03/19 09:35:46] d2.engine.train_loop INFO: Analysing 1200 / 1400
[03/19 09:36:30] d2.engine.train_loop INFO: Analysing 1300 / 1400
My CGSH_val.yaml
is:
CUDNN_BENCHMARK: True
OUTPUT_DIR: "./output/1_unk_val"
MODEL:
PIXEL_MEAN: [0, 0, 0]
PIXEL_STD: [1, 1, 1]
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHTS: "./output/1_unk/model_final.pth"
RPN:
PRE_NMS_TOPK_TEST: 6000
POST_NMS_TOPK_TEST: 1000
ROI_HEADS:
# NUM_CLASSES: 2 # 0~30 Known class; 31 -> Unknown; 32 -> Background.
NUM_CLASSES: 10
NAME: "Res5ROIHeads"
MASK_ON: False
RESNETS:
DEPTH: 50
INPUT:
MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
MIN_SIZE_TEST: 800
DATASETS:
TRAIN: ("cgsh_val",)
TEST: ("cgsh_val",)
DATALOADER:
FILTER_EMPTY_ANNOTATIONS: False
NUM_WORKERS: 16
SOLVER:
# gpus 2
IMS_PER_BATCH: 16
# 0.02 * bs / 16
BASE_LR: 0.02
STEPS: (12000, 16000)
GAMMA: 0.1
MAX_ITER: 700
LR_SCHEDULER_NAME: WarmupMultiStepLR
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 0
WARMUP_METHOD: linear
TEST:
EXPECTED_RESULTS: []
VERSION: 2
OWOD:
ENABLE_THRESHOLD_AUTOLABEL_UNK: True
NUM_UNK_PER_IMAGE: 1
ENABLE_UNCERTAINITY_AUTOLABEL_UNK: False
ENABLE_CLUSTERING: False
FEATURE_STORE_SAVE_PATH: 'feature_store'
SKIP_TRAINING_WHILE_EVAL: False
PREV_INTRODUCED_CLS: 0
CUR_INTRODUCED_CLS: 1
COMPUTE_ENERGY: True
ENERGY_SAVE_PATH: 'energy'
SKIP_TRAINING_WHILE_EVAL: False
TEMPERATURE: 1.5
CLUSTERING:
ITEMS_PER_CLASS: 20
START_ITER: 1000
UPDATE_MU_ITER: 3000
MOMENTUM: 0.99
Z_DIMENSION: 128
@kendyChina : Were you able to resolve the issue? I think it is related to #16
hi,Did you work it out?
Hi, can you share how to solve it? I came into exactly the same issue.
@kendyChina Hello, did you resolve it?
Hello, dear author When I use train.yaml to train the model and use val.yaml to train the EBUI component based on the verification set, the following error is reported in the
Fit_Weibull_3P
:I printed my
unk
andknown
variables and found that they did have values less than zero. As I am not familiar with Helmholtz free energy formulation, I hope to get your help. Thank you!