facebookresearch / MultiplexedOCR

Code for CVPR21 paper A Multiplexed Network for End-to-End, Multilingual OCR
Other
79 stars 10 forks source link

Exception: NaN detected! while training mlt17 #5

Closed mobassir94 closed 2 years ago

mobassir94 commented 2 years ago

i get nan while training using mlt17 dataset: error coming from this line : https://github.com/facebookresearch/MultiplexedOCR/blob/main/multiplexer/engine/train_loop.py#L170 and from spn.py i don't know how i am getting nan while using mlt17 dataset

(multiplexer) apsisdev@ML:/backup/Downloads/Compressed/MultiplexedOCR$ python3 /backup/Downloads/Compressed/MultiplexedOCR/tools/train_net.py --config-file $yaml
`fused_weight_gradient_mlp_cuda` module not found. gradient accumulation fusion with weight gradient computation disabled.
2022-04-19 17:11:35,139 multiplexer INFO: Using 1 GPUs
2022-04-19 17:11:35,140 multiplexer INFO: Namespace(config_file='/backup/Downloads/Compressed/MultiplexedOCR/configs/demo.yaml', eval_only=False, no_color=False, num_gpus=1, num_machines=1, machine_rank=0, dist_url='auto', opts=[], distributed=False, local_rank=0)
2022-04-19 17:11:35,140 multiplexer INFO: Collecting env info (might take some time)
2022-04-19 17:11:36,124 multiplexer INFO: 
PyTorch version: 1.10.2
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9.7 (default, Sep 16 2021, 13:09:58)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.2.152
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.10.2
[pip3] torchaudio==0.10.2
[pip3] torchvision==0.11.3
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               h2bc3f7f_2  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] numpy                     1.21.2           py39h20f2e39_0  
[conda] numpy-base                1.21.2           py39h79a1101_0  
[conda] pytorch                   1.10.2          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.10.2               py39_cu113    pytorch
[conda] torchvision               0.11.3               py39_cu113    pytorch
        Pillow (9.0.1)
2022-04-19 17:11:36,125 multiplexer INFO: Loaded configuration file /backup/Downloads/Compressed/MultiplexedOCR/configs/demo.yaml
2022-04-19 17:11:36,125 multiplexer INFO: 
CHAR_MAP:
  DIR: /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/
DATALOADER:
  ASPECT_RATIO_GROUPING: false
  NUM_WORKERS: 1
  SIZE_DIVISIBILITY: 32
DATASETS:
  AUG: false
  IGNORE_DIFFICULT: false
  MAX_ROTATE_THETA: 90
  RATIOS:
  - 100.0
  - 20.0
  TEST:
  - mlt17_eval
  TRAIN:
  - mlt17_train
INPUT:
  MAX_SIZE_TEST: 4000
  MAX_SIZE_TRAIN: 2333
  MIN_SIZE_TEST: 1000
  MIN_SIZE_TRAIN: (800, 1000, 1200, 1400)
MODEL:
  BACKBONE:
    CONV_BODY: R-50-FPN
    OUT_CHANNELS: 256
  CHAR_MASK_ON: false
  LANGUAGE_HEAD:
    NUM_CLASSES: 8
    PREDICTOR: V1LanguagePredictor
    INPUT_H: 32
    INPUT_W: 32
    INPUT_C: 256
    CONV1_C: 64
    CONV2_C: 32
  MASK_ON: true
  META_ARCHITECTURE: GeneralizedRCNN
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  ROI_BOX_HEAD:
    FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor
    NUM_CLASSES: 2
    POOLER_RESOLUTION: 7
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25,)
    PREDICTOR: FPNPredictor
    USE_MASKED_FEATURE: true
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    USE_FPN: true
  ROI_MASK_HEAD:
    CHAR_NUM_CLASSES: 37
    FEATURE_EXTRACTOR: MaskRCNNFPNFeatureExtractor
    MASK_BATCH_SIZE_PER_IM: 48
    POOLER_RESOLUTION: 14
    POOLER_RESOLUTION_H: 32
    POOLER_RESOLUTION_W: 32
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25,)
    PREDICTOR: MultiSeqLangMaskRCNNC4Predictor
    RESOLUTION: 28
    RESOLUTION_H: 64
    RESOLUTION_W: 64
    SHARE_BOX_FEATURE_EXTRACTOR: false
    USE_MASKED_FEATURE: true
    USE_WEIGHTED_CHAR_MASK: true
  RPN:
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    FPN_POST_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TEST: 1000
    PRE_NMS_TOP_N_TEST: 1000
    PRE_NMS_TOP_N_TRAIN: 2000
    USE_FPN: true
  SEG:
    BINARY_THRESH: 0.1
    BOX_THRESH: 0.1
    EXPAND_RATIO: 3.0
    MIN_SIZE: 5
    SHRINK_RATIO: 0.4
    TOP_N_TEST: 1000
    TOP_N_TRAIN: 1000
    USE_FPN: true
    USE_FUSE_FEATURE: true
  SEG_ON: true
SEQUENCE:
  ARABIC:
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 80
  BENGALI:
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 110
  BOS_TOKEN: 0
  CHINESE:
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 5200
  DEVANAGARI:
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 110
  HANGUL:
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 1500
  JAPANESE:
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 2210
  LANGUAGES:
  - ar
  - bn
  - hi
  - ja
  - ko
  - la
  - zh
  - symbol
  LANGUAGES_ENABLED:
  - ar
  - bn
  - hi
  - ja
  - ko
  - la
  - zh
  - symbol
  LANGUAGES_UNFREEZED:
  - ar
  - bn
  - hi
  - ja
  - ko
  - la
  - zh
  - symbol
  LATIN:
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 250
  MAX_LENGTH: 32
  NUM_CHAR: 36
  NUM_SEQ_HEADS: 8
  RESIZE_HEIGHT: 16
  RESIZE_WIDTH: 64
  SEQ_ON: true
  SYMBOL:
    EMBED_SIZE: 30
    HIDDEN_SIZE: 64
    NUM_CHAR: 60
  TEACHER_FORCE_RATIO: 1.0
SOLVER:
  BASE_LR: 0.0001
  CHECKPOINT_PERIOD: 5000
  DISPLAY_FREQ: 20
  IMS_PER_BATCH: 1
  MAX_ITER: 100000
  RESUME: false
  STEPS: (60000, 120000)
  WARMUP_FACTOR: 0.1
  WEIGHT_DECAY: 0.0001
TEST:
  CHAR_THRESH: 192
  IMS_PER_BATCH: 1
  VIS: true

2022-04-19 17:11:36,126 multiplexer INFO: Running with config:
AMP_VERBOSE: False
CHAR_MAP:
  DIR: /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/
DATALOADER:
  ASPECT_RATIO_GROUPING: False
  NUM_WORKERS: 1
  SIZE_DIVISIBILITY: 32
DATASETS:
  AUG: False
  AUGMENTER:
    MIN_BOX_NUM_RATIO: 0.5
    MIN_HEIGHT_RATIO: 0.5
    MIN_WIDTH_RATIO: 0.5
    NAME: ResizerV0
    TEST: ResizerV0
  CROP_SIZE: (512, 512)
  FIX_CROP: False
  FIX_ROTATE: False
  IGNORE_DIFFICULT: False
  MAX_ROTATE_THETA: 90
  RANDOM_CROP_PROB: 0.0
  RANDOM_ROTATE_PROB: 0.5
  RATIOS: [100.0, 20.0]
  TEST: ('mlt17_eval',)
  TRAIN: ('mlt17_train',)
DTYPE: float32
INPUT:
  MAX_SIZE_TEST: 4000
  MAX_SIZE_TRAIN: 2333
  MIN_SIZE_TEST: 1000
  MIN_SIZE_TRAIN: (800, 1000, 1200, 1400)
  PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
  PIXEL_STD: [1.0, 1.0, 1.0]
  SQR_SIZE_TEST: 600
  STRICT_RESIZE: False
  TO_BGR255: True
MODEL:
  BACKBONE:
    CONV_BODY: R-50-FPN
    FREEZE_CONV_BODY_AT: 2
    FROZEN: False
    NAME: build_resnet_fpn_backbone
    OUT_CHANNELS: 256
  CHAR_MASK_ON: False
  DEVICE: cuda
  FBNET:
    ARCH: default
    ARCH_DEF: 
    BN_TYPE: bn
    DET_HEAD_BLOCKS: []
    DET_HEAD_LAST_SCALE: 1.0
    DET_HEAD_STRIDE: 0
    DW_CONV_SKIP_BN: True
    DW_CONV_SKIP_RELU: True
    KPTS_HEAD_BLOCKS: []
    KPTS_HEAD_LAST_SCALE: 0.0
    KPTS_HEAD_STRIDE: 0
    MASK_HEAD_BLOCKS: []
    MASK_HEAD_LAST_SCALE: 0.0
    MASK_HEAD_STRIDE: 0
    NUM_GROUPS: 32
    RPN_BN_TYPE: 
    RPN_HEAD_BLOCKS: 0
    SCALE_FACTOR: 1.0
    STEM_IN_CHANNELS: 3
    WIDTH_DIVISOR: 1
  FBNET_V2:
    ARCH: default
    ARCH_DEF: []
    NORM: bn
    NORM_ARGS: []
    SCALE_FACTOR: 1.0
    STEM_IN_CHANNELS: 3
    WIDTH_DIVISOR: 1
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES: []
    NORM: 
    OUT_CHANNELS: 256
    USE_GN: False
    USE_PRETRAINED: False
    USE_RELU: False
  LANGUAGE: en_num_36
  LANGUAGE_GROUPER:
    FROZEN: False
    GUMBLE_SOFTMAX_TAU: 1
    LOSS_WEIGHT: 1.0
    MIN_TASKS: 1.0
    NAME: BinaryLanguageGrouper
  LANGUAGE_HEAD:
    CONV1_C: 64
    CONV2_C: 32
    FROZEN: False
    INPUT_C: 256
    INPUT_H: 32
    INPUT_W: 32
    LOSS_WEIGHT: 1.0
    NUM_CLASSES: 8
    PREDICTOR: V1LanguagePredictor
  MASK_ON: True
  META_ARCHITECTURE: GeneralizedRCNN
  PROPOSAL_GENERATOR:
    NAME: SPN
  RESNET34: False
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
    DEFORMABLE_GROUPS: 1
    LAYERS: (3, 4, 6, 3)
    NUM_GROUPS: 1
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STAGE_WITH_DCN: (False, False, False, False)
    STEM_FUNC: StemWithFixedBatchNorm
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: True
    TRANS_FUNC: BottleneckWithFixedBatchNorm
    WIDTH_PER_GROUP: 64
    WITH_MODULATED_DCN: False
  ROI_BOX_HEAD:
    FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor
    FROZEN: False
    INFERENCE_USE_BOX: True
    MIX_OPTION: 
    MLP_HEAD_DIM: 1024
    NAME: ROIBoxHead
    NUM_CLASSES: 2
    POOLER_RESOLUTION: 7
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25,)
    POST_PROCESSOR: BaseBoxPostProcessor
    PREDICTOR: FPNPredictor
    SOFT_MASKED_FEATURE_RATIO: 0.0
    USE_MASKED_FEATURE: True
    USE_REGRESSION: True
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
    BG_IOU_THRESHOLD: 0.5
    DETECTIONS_PER_IMG: 100
    FG_IOU_THRESHOLD: 0.5
    NAME: CombinedROIHead
    NMS: 0.5
    POSITIVE_FRACTION: 0.25
    SCORE_THRESH: 0.0
    USE_FPN: True
  ROI_MASK_HEAD:
    CHAR_NUM_CLASSES: 37
    CONV5_ARCH: transpose_a
    CONV_LAYERS: (256, 256, 256, 256)
    CROPPER_RESOLUTION_H: 80
    CROPPER_RESOLUTION_W: 160
    FEATURE_EXTRACTOR: MaskRCNNFPNFeatureExtractor
    FEATURE_EXTRACTOR_FROZEN: False
    MASK_BATCH_SIZE_PER_IM: 48
    MASK_FCN_INPUT_DIM: 256
    MIX_OPTION: 
    MLP_HEAD_DIM: 1024
    NAME: BaseROIMaskHead
    POOLER_RESOLUTION: 14
    POOLER_RESOLUTION_H: 32
    POOLER_RESOLUTION_W: 32
    POOLER_SAMPLING_RATIO: 2
    POOLER_SCALES: (0.25,)
    POST_PROCESSOR: MultiSeq1CharMaskPostProcessor
    PREDICTOR: MultiSeqLangMaskRCNNC4Predictor
    PREDICTOR_TRUNK_FROZEN: False
    RESOLUTION: 28
    RESOLUTION_H: 64
    RESOLUTION_W: 64
    ROI_CROPPER: RotatedROICropper
    SHARE_BOX_FEATURE_EXTRACTOR: False
    SOFT_MASKED_FEATURE_RATIO: 0.0
    USE_MASKED_FEATURE: True
    USE_WEIGHTED_CHAR_MASK: True
  RPN:
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    ASPECT_RATIOS: (0.5, 1.0, 2.0)
    BATCH_SIZE_PER_IMAGE: 256
    BG_IOU_THRESHOLD: 0.3
    FG_IOU_THRESHOLD: 0.7
    FPN_POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TRAIN: 2000
    MIN_SIZE: 0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    PRE_NMS_TOP_N_TRAIN: 2000
    STRADDLE_THRESH: 0
    USE_FPN: True
  RPN_ONLY: False
  SEG:
    AUG_PROPOSALS: False
    BATCH_SIZE_PER_IMAGE: 256
    BINARY_THRESH: 0.1
    BN_FROZEN: False
    BOX_EXPAND_RATIO: 1.5
    BOX_THRESH: 0.1
    EXPAND_METHOD: constant
    EXPAND_RATIO: 3.0
    FROZEN: False
    IGNORE_DIFFICULT: True
    LOSS: Dice
    MIN_SIZE: 5
    MULTIPLE_THRESH: (0.2, 0.3, 0.5, 0.7)
    NAME: SEGModule
    POSITIVE_FRACTION: 0.5
    POST_PROCESSOR: SEGPostProcessor
    SHRINK_RATIO: 0.4
    TOP_N_TEST: 1000
    TOP_N_TRAIN: 1000
    USE_FPN: True
    USE_FUSE_FEATURE: True
    USE_MULTIPLE_THRESH: False
    USE_PPM: False
    USE_SEG_POLY: False
  SEG_ON: True
  TORCHSCRIPT_ONLY: False
  TRAIN_DETECTION_ONLY: False
  WEIGHT: 
MULTIPLEXER:
  LANGUAGE_WEIGHT_MODE: soft
  LOSS_FORMAT: separate
  TEST:
    RUN_ALL_HEADS: False
OUTPUT:
  FB_COCO:
    DET_THRESH: 0.2
    EVAL: False
    SEQ_THRESH: 0.0
  ICDAR15:
    INTERMEDIATE: False
    TASK1: False
    TASK4: False
  MLT17:
    TASK1: False
    TASK3: False
  MLT19:
    DET_THRESH:
      TASK4: 0.2
    INTERMEDIATE: False
    INTERMEDIATE_WITH_PKL: False
    LEXICON:
      EDIT_DIST_THRESH: 0.5
      NAME: none
    SEQ_THRESH:
      TASK4: 0.8
    TASK1: False
    TASK3: False
    TASK4: False
    VALIDATION_EVAL: True
  ON_THE_FLY: True
  SEG_VIS: False
  TMP_FOLDER: /tmp/multiplexer
  TOTAL_TEXT:
    DET_EVAL: False
    E2E_EVAL: False
    INTERMEDIATE: False
  ZIP_PER_GPU: False
OUTPUT_DIR: .
PATHS_CATALOG: /backup/Downloads/Compressed/MultiplexedOCR/multiplexer/config/paths_catalog.py
SEQUENCE:
  AMHARIC:
    ARCH: seq2seq_a
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 370
  ANY:
    ARCH: seq2seq_a
    EMBED_SIZE: 250
    HIDDEN_SIZE: 320
    NUM_CHAR: 11000
  ANY1:
    ARCH: ctc_lstm
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 11000
  ANY2:
    ARCH: ctc_lstm
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 11000
  ANY3:
    ARCH: ctc_lstm
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 11000
  ANY4:
    ARCH: ctc_lstm
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 11000
  ANY5:
    ARCH: ctc_lstm
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 11000
  ANY6:
    ARCH: ctc_lstm
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 11000
  ANY7:
    ARCH: ctc_lstm
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 11000
  ARABIC:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 80
  BEAM_SEARCH: True
  BENGALI:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 110
  BOS_TOKEN: 0
  BULGARIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  BURMESE:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 160
  CHINESE:
    ARCH: seq2seq_a
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 5200
  CROATIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 100
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  CYRILLIC:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 130
  DECODER_LOSS: NLLLoss
  DEVANAGARI:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 110
  DUTCH:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 100
  EMBED_SIZE: 38
  ENGLISH:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 120
  EN_NUM:
    ARCH: seq2seq_a
    EMBED_SIZE: 38
    HIDDEN_SIZE: 256
    NUM_CHAR: 36
  EN_NUM_36:
    ARCH: seq2seq_a
    EMBED_SIZE: 38
    HIDDEN_SIZE: 256
    NUM_CHAR: 36
  FRENCH:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 160
  GERMAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  GREEK:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  GUJARATI:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  HANGUL:
    ARCH: seq2seq_a
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 1500
  HEBREW:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  HIDDEN_SIZE: 256
  HUNGARIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  INDONESIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 100
  ITALIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  JAPANESE:
    ARCH: seq2seq_a
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 2210
  JAVANESE:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 168
    NUM_CHAR: 110
  KANA:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 210
  KANNADA:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  KHMER:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 140
  LANGUAGES: ('ar', 'bn', 'hi', 'ja', 'ko', 'la', 'zh', 'symbol')
  LANGUAGES_ENABLED: ('ar', 'bn', 'hi', 'ja', 'ko', 'la', 'zh', 'symbol')
  LANGUAGES_UNFREEZED: ['ar', 'bn', 'hi', 'ja', 'ko', 'la', 'zh', 'symbol']
  LATIN:
    ARCH: seq2seq_a
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 250
  LOSS_WEIGHT: 0.5
  LOSS_WEIGHT_BASE: 0.0
  MALAY:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  MALAYALAM:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 130
  MARATHI:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 130
  MAX_LENGTH: 32
  MEAN_SCORE: False
  NUMBER:
    ARCH: seq2seq_a
    EMBED_SIZE: 10
    HIDDEN_SIZE: 64
    NUM_CHAR: 20
  NUM_CHAR: 36
  NUM_SEQ_HEADS: 8
  PERSIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 50
    HIDDEN_SIZE: 256
    NUM_CHAR: 160
  POLISH:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  PORTUGUESE:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  PUNJABI:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  RESIZE_HEIGHT: 16
  RESIZE_WIDTH: 64
  ROMANIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  RUSSIAN:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 140
  SEQ_ON: True
  SHARED_CONV5_MASK: True
  SINHALA:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 120
  SPANISH:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  SYMBOL:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 64
    NUM_CHAR: 60
  TAGALOG:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 110
  TAMIL:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 110
  TEACHER_FORCE_RATIO: 1.0
  TELUGU:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 130
  THAI:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 210
  TURKISH:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 120
  TWO_CONV: False
  UNIFIEDAPU:
    ARCH: seq2seq_a
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 250
  UNIFIEDAPUE:
    ARCH: seq2seq_a
    EMBED_SIZE: 250
    HIDDEN_SIZE: 256
    NUM_CHAR: 300
  UNIFIEDBGHMP:
    ARCH: seq2seq_a
    EMBED_SIZE: 270
    HIDDEN_SIZE: 256
    NUM_CHAR: 400
  UNIFIEDBKT:
    ARCH: seq2seq_a
    EMBED_SIZE: 250
    HIDDEN_SIZE: 256
    NUM_CHAR: 400
  UNIFIEDCG:
    ARCH: seq2seq_a
    EMBED_SIZE: 220
    HIDDEN_SIZE: 256
    NUM_CHAR: 260
  UNIFIEDCGE:
    ARCH: seq2seq_a
    EMBED_SIZE: 220
    HIDDEN_SIZE: 256
    NUM_CHAR: 280
  UNIFIEDCJ:
    ARCH: seq2seq_a
    EMBED_SIZE: 2000
    HIDDEN_SIZE: 320
    NUM_CHAR: 9000
  UNIFIEDCJE:
    ARCH: seq2seq_a
    EMBED_SIZE: 2000
    HIDDEN_SIZE: 320
    NUM_CHAR: 9000
  UNIFIEDCYRILLIC:
    ARCH: seq2seq_a
    EMBED_SIZE: 180
    HIDDEN_SIZE: 256
    NUM_CHAR: 180
  UNIFIEDDEVANAGARI:
    ARCH: seq2seq_a
    EMBED_SIZE: 180
    HIDDEN_SIZE: 256
    NUM_CHAR: 180
  UNIFIEDKE:
    ARCH: seq2seq_a
    EMBED_SIZE: 1500
    HIDDEN_SIZE: 320
    NUM_CHAR: 2500
  UNIFIEDKT:
    ARCH: seq2seq_a
    EMBED_SIZE: 200
    HIDDEN_SIZE: 256
    NUM_CHAR: 220
  UNIFIEDLATIN1:
    ARCH: seq2seq_a
    EMBED_SIZE: 270
    HIDDEN_SIZE: 256
    NUM_CHAR: 500
  UNIFIEDMST:
    ARCH: seq2seq_a
    EMBED_SIZE: 250
    HIDDEN_SIZE: 256
    NUM_CHAR: 300
  URDU:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 192
    NUM_CHAR: 160
  VIETNAMESE:
    ARCH: seq2seq_a
    EMBED_SIZE: 30
    HIDDEN_SIZE: 256
    NUM_CHAR: 260
SOLVER:
  BASE_LR: 0.0001
  BIAS_LR_FACTOR: 2
  CHECKPOINT_PERIOD: 5000
  DISPLAY_FREQ: 20
  GAMMA: 0.1
  IMS_PER_BATCH: 1
  MAX_ITER: 100000
  MOMENTUM: 0.9
  POW_SCHEDULE: False
  RESUME: False
  STEPS: (60000, 120000)
  USE_ADAM: False
  WARMUP_FACTOR: 0.1
  WARMUP_ITERS: 500
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: 0
TEST:
  BBOX_AUG:
    ENABLED: False
    MIN_SIZE: (800, 1000, 1200, 1400)
  CHAR_THRESH: 192
  EXPECTED_RESULTS: []
  EXPECTED_RESULTS_SIGMA_TOL: 4
  IMS_PER_BATCH: 1
  MASK2POLYGON_OP: python
  TORCHSCRIPT:
    ENABLED: False
    WEIGHT: 
  VIS: True
Selected optimization level O0:  Pure FP32 training.

Defaults for this optimization level are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
2022-04-19 17:11:38,797 multiplexer.checkpoint.detection_checkpoint INFO: No checkpoint found. Initializing model from scratch
2022-04-19 17:11:38,799 multiplexer.data.datasets.icdar17mlt INFO: Total #images in the dir /backup/Downloads/Compressed/MultiplexedOCR/datasets/MLT17/train/imgs: 7200
....................... 1
2022-04-19 17:11:38,804 multiplexer.train_loop INFO: Start training
.......Images..........
 <multiplexer.structures.image_list.ImageList object at 0x7fe9ed9b9760>
.......targets..........
 [BoxList(num_boxes=8, image_width=1030, image_height=2333, mode=xyxy)]
.......targets[0]..........
 BoxList(num_boxes=8, image_width=1030, image_height=2333, mode=xyxy)
2022-04-19 17:11:40,711 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 72 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/arabic.json.
2022-04-19 17:11:40,715 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 108 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/bengali.json.
2022-04-19 17:11:40,719 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 106 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/devanagari.json.
2022-04-19 17:11:40,723 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 2242 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/japanese.json.
2022-04-19 17:11:40,729 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 1488 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/hangul.json.
2022-04-19 17:11:40,733 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 242 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/latin.json.
2022-04-19 17:11:40,739 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 5146 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/chinese.json.
2022-04-19 17:11:40,744 multiplexer.utils.char_map INFO: [Info] Loaded char_map with 54 characters from /backup/Downloads/Compressed/MultiplexedOCR/charmap/public/v3/symbol.json.
/home/apsisdev/anaconda3/envs/multiplexer/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1640811803361/work/aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2022-04-19 17:11:41,109 multiplexer.train_loop INFO: eta: 2 days, 15:59:55  iter: 0  loss: 1987.7233 (1987.7233)  loss_classifier: 5.1704 (5.1704)  loss_box_reg: 2.2477 (2.2477)  loss_language_pred: 989.9980 (989.9980)  loss_seq_ar: 15.8070 (15.8070)  loss_seq_bn: 17.6221 (17.6221)  loss_seq_hi: 15.0108 (15.0108)  loss_seq_ja: 22.8817 (22.8817)  loss_seq_ko: 20.2502 (20.2502)  loss_seq_la: 18.8446 (18.8446)  loss_seq_zh: 26.3838 (26.3838)  loss_seq_symbol: 18.3731 (18.3731)  loss_mask: 834.2153 (834.2153)  loss_seg: 0.9183 (0.9183)  time: 2.3040 (2.3040)  data: 0.2325 (0.2325)  lr: 0.000010  max mem: 3840
.......Images..........
 <multiplexer.structures.image_list.ImageList object at 0x7fe9ed9b9d30>
.......targets..........
 [BoxList(num_boxes=11, image_width=800, image_height=1424, mode=xyxy)]
.......targets[0]..........
 BoxList(num_boxes=11, image_width=800, image_height=1424, mode=xyxy)
.......Images..........
 <multiplexer.structures.image_list.ImageList object at 0x7fe9ed9b9d00>
.......targets..........
 [BoxList(num_boxes=9, image_width=1866, image_height=1400, mode=xyxy)]
.......targets[0]..........
 BoxList(num_boxes=9, image_width=1866, image_height=1400, mode=xyxy)
.......Images..........
 <multiplexer.structures.image_list.ImageList object at 0x7fe9e416da60>
.......targets..........
 [BoxList(num_boxes=15, image_width=1600, image_height=1200, mode=xyxy)]
.......targets[0]..........
 BoxList(num_boxes=15, image_width=1600, image_height=1200, mode=xyxy)
[Debug] NaN detected:
[Debug] pred = tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0',
       grad_fn=<SigmoidBackward0>)
[Debug] gt = tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]], device='cuda:0')
[Debug] m = tensor([[[1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         ...,
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.]]], device='cuda:0')
[Debug] intersection = nan
[Debug] union = nan
Traceback (most recent call last):
  File "/backup/Downloads/Compressed/MultiplexedOCR/tools/train_net.py", line 230, in <module>
    detectron2_launch(parse_args())
  File "/backup/Downloads/Compressed/MultiplexedOCR/tools/train_net.py", line 188, in detectron2_launch
    launch(
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/engine/launch.py", line 81, in launch
    main_func(*args)
  File "/backup/Downloads/Compressed/MultiplexedOCR/tools/train_net.py", line 151, in main
    train(cfg, args.local_rank, args.distributed, tb_logger)
  File "/backup/Downloads/Compressed/MultiplexedOCR/tools/train_net.py", line 79, in train
    do_train(
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/engine/train_loop.py", line 172, in do_train
    loss_dict = model(images, targets)
  File "/home/apsisdev/anaconda3/envs/multiplexer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/apsisdev/anaconda3/envs/multiplexer/lib/python3.9/site-packages/apex/amp/_initialize.py", line 196, in new_fwd
    output = old_fwd(*applier(args, input_caster),
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/modeling/meta_arch/rcnn.py", line 67, in forward
    proposal_out = self.forward_proposal(images, features, targets)
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/modeling/meta_arch/rcnn.py", line 80, in forward_proposal
    (proposals, proposal_losses), fuse_feature = self.proposal(
  File "/home/apsisdev/anaconda3/envs/multiplexer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/modeling/proposal_generator/spn.py", line 688, in forward
    return self._forward_train(preds, targets, image_shapes), [fuse_feature]
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/modeling/proposal_generator/spn.py", line 703, in _forward_train
    loss_seg = self.loss_evaluator(preds, targets)
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/modeling/proposal_generator/spn.py", line 581, in __call__
    seg_loss = self.loss_function(preds, segm_targets, masks)
  File "/backup/Downloads/Compressed/MultiplexedOCR/multiplexer/modeling/proposal_generator/spn.py", line 626, in dice_loss
    raise Exception("NaN detected!")
Exception: NaN detected!

@SuperIRabbit need help please,thanks

gtb1551050818 commented 2 years ago

Hello, I encountered the problem of data set loading during the reproduction process:

FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/jinghuang/datasets/MLT17/train/imgs'

How do you load the dataset correctly? Can you help me? Thank you

mobassir94 commented 2 years ago

@gtb1551050818 after spending a lot of time on this project,i faced several issues because of lack of documentation of this project,i needed response and help from authors badly, but the author of this project is irresponsible (it is kind of wasted implementation if your customer's can't use it and you do not care or reply) so i just simply switched to another pipeline for multilingual ocr,,i am working on paddleocr,yes it hasn't got multi head multilingual ocr yet but the authors are very responsible there and helpful. i am not using this multilingual implementation anymore,,it's hard to work on a project that has little or almost no documentation (bad public implementation IMHO)

gtb1551050818 commented 2 years ago

@gtb1551050818 after spending a lot of time on this project,i faced several issues because of lack of documentation of this project,i needed response and help from authors badly, but the author of this project is irresponsible (it is kind of wasted implementation if your customer's can't use it and you do not care or reply) so i just simply switched to another pipeline for multilingual ocr,,i am working on paddleocr,yes it hasn't got multi head multilingual ocr yet but the authors are very responsible there and helpful. i am not using this multilingual implementation anymore,,it's hard to work on a project that has little or almost no documentation (bad public implementation IMHO)

Yes, it is really difficult to reproduce the current documents alone, especially for beginners like me. But I noticed that you seem to have loaded the mlt17 dataset correctly. May I take the liberty to ask where you put the mlt17, or what changes you have made to enable it to load the dataset successfully? I also want to try again to see if it can be reproduced. If it succeeds, hahaha

mobassir94 commented 2 years ago

@gtb1551050818 here is my last modified version of multiplexedocr where i loaded mlt17 correctly : https://drive.google.com/drive/folders/1ep8GXP3tT2cQN2i22aE7iioldg9y00nP?usp=sharing

please give it a try and if you can train without getting NaN then let me know as well with updated code so that i can learn,thanks

gtb1551050818 commented 2 years ago

@gtb1551050818 here is my last modified version of multiplexedocr where i loaded mlt17 correctly : https://drive.google.com/drive/folders/1ep8GXP3tT2cQN2i22aE7iioldg9y00nP?usp=sharing

please give it a try and if you can train without getting NaN then let me know as well with updated code so that i can learn,thanks

Thank you very much. If I can get through, I will tell you!

Shualite commented 2 years ago

@mobassir94 Hello, I met the same problem as you and also stuck on this NaN issue. There is only one difference that I use MLT19 and MLT19 Synthetic as training data follow the original paper settings. At the beginning, we think it's a data problem. Then, we spend a lot time locating the wrong image. So far, the error seems to have happened by accident and we have found nothing.

SuperIRabbit commented 2 years ago

Hi everyone, apologies for the late reply, as I stopped working on OCR a few months ago due to some internal changes. From our experience, the NaN issue typically happens when you are training from scratch with difficult datasets and improper learning rates, which makes the segmentation network fail. Therefore, as mentioned in the paper, "we initialize the detection, segmentation, and mask feature extraction weights from the officially published weights released by Mask TextSpotter v3". I have also updated a pretrained weights for multiplexed model so that you can fine-tune from it.

You can specify

MODEL.WEIGHT ${path_to_pretrained_weights}

in your training command to initialize from the pretrained weights.

SuperIRabbit commented 2 years ago

Hello, I encountered the problem of data set loading during the reproduction process:

FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/jinghuang/datasets/MLT17/train/imgs'

How do you load the dataset correctly? Can you help me? Thank you

Hi @gtb1551050818, you need to download the MLT17 datasets (as well as the other datasets) from the official site and extract the images and ground truth annotations into the corresponding folders