xcharxlie commented 3 years ago

I trained my own datasets and now in the testing stage, using 4 GPU and all TotalTextdefault settings(5000 iterations). However, it seems like all the outputs(which is the feature "rec" in the annotation) are the same for some reason.It looks like Issue 371. How could I fix it?

xcharxlie commented 3 years ago

Here's the setting

sys.platform linux Python 3.8.10 (default, Jun 4 2021, 15:09:15) [GCC 7.5.0] numpy 1.20.2 detectron2 0.1.3 @/home/CN/zizhang.wu/zzr/AdelaiDet/detectron2/detectron2 Compiler GCC 5.5 CUDA compiler CUDA 10.2 detectron2 arch flags sm_75 DETECTRON2_ENV_MODULE PyTorch 1.9.0 @/home/CN/zizhang.wu/anaconda3/envs/Adet/lib/python3.8/site-packages/torch PyTorch debug build False GPU available True GPU 0,1,2,3 TITAN RTX CUDA_HOME /usr/local/cuda-10.2 Pillow 8.3.1 torchvision 0.10.0 @/home/CN/zizhang.wu/anaconda3/envs/Adet/lib/python3.8/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 fvcore 0.1.1.dev200512 cv2 4.5.3

PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[07/15 13:38:25] detectron2 INFO: Command line arguments: Namespace(config_file='configs/BAText/TotalText/attn_R_50.yaml', dist_url='tcp://127.0.0.1:51458', eval_only=False, machine_rank=0, num_gpus=4, num_machines=1, opts=[], resume=False) [07/15 13:38:25] detectron2 INFO: Contents of args.config_file=configs/BAText/TotalText/attn_R_50.yaml: BASE: "Base-TotalText.yaml" MODEL: WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl" RESNETS: DEPTH: 50 BATEXT: RECOGNIZER: "attn" # "attn" "rnn" SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.001 MAX_ITER: 5000 CHECKPOINT_PERIOD: 1000

TEST:

EVAL_PERIOD: 1000

OUTPUT_DIR: "outputs2/batext/totaltext/attn_R_50"

Yuliang-Liu commented 3 years ago

@xcharxlie Have you used the pretrained model to finetune on your own dataset? If you start training from scratch, you may need a lot of synthetic data and longer iterations. Using the syntext-curved data we provided is an alternative.

xcharxlie commented 3 years ago

@xcharxlie Have you used the pretrained model to finetune on your own dataset? If you start training from scratch, you may need a lot of synthetic data and longer iterations. Using the syntext-curved data we provided is an alternative.

@Yuliang-Liu That makes sense. I didn’t pretrained it well so the model sucks. I have a question regarding the pretraning, like which dataset I should pretrain? I looked into the confit file and it says it would pretrained the syntext1, syntext2 and TotalText and mlt? Since training all of them may take a loooong time, should I really pretrain all of them or just choose anyone. Thank you!

Yuliang-Liu commented 3 years ago

@xcharxlie I would recommend to follow the default pretraining setting. It may take about one day training on all data using 8 V100, which shouldn't be too long.

xcharxlie commented 3 years ago

@Yuliang-Liu I hope so as well. But due to some technical difficulties, I only have 4 GPU right now and I just checked it may take more than 2 days to just pretrain the totaltext train_images, so I would train this dataset first and see how it goes. If that doesn't work, I would try to get more GPUs for a better pretraining. Also another question, how many training images are recommended for fine-tuning? Thank you so much.

Yuliang-Liu commented 3 years ago

@xcharxlie The number of data does not actually change the training time. The training time should be mainly related to the number of the iterations you set, no matter how many data you use.

xcharxlie commented 3 years ago

@Yuliang-Liu Thank you so much. One more question, sometimes the program would make little mistakes like recognizing '0' as 8. Now I'm trying to evaluate the model by checking the precision, but ended up with this error. File "tools/train_net.py", line 212, in launch( File "/home/CN/zizhang.wu/zzr/AdelaiDet/detectron2/detectron2/engine/launch.py", line 62, in launch main_func(*args) File "tools/train_net.py", line 189, in main res = Trainer.test(cfg, model) # d2 defaults.py File "/home/CN/zizhang.wu/zzr/AdelaiDet/detectron2/detectron2/engine/defaults.py", line 515, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/CN/zizhang.wu/zzr/AdelaiDet/detectron2/detectron2/evaluation/evaluator.py", line 176, in inference_on_dataset results = evaluator.evaluate() File "/home/CN/zizhang.wu/zzr/AdelaiDet/adet/evaluation/text_evaluation.py", line 210, in evaluate text_result = self.evaluate_with_official_code(result_path, self._text_eval_gt_path) File "/home/CN/zizhang.wu/zzr/AdelaiDet/adet/evaluation/text_evaluation.py", line 178, in evaluate_with_official_code return text_eval_script.text_eval_main(det_file=result_path, gt_file=gt_path, is_word_spotting=self._word_spotting) File "/home/CN/zizhang.wu/zzr/AdelaiDet/adet/evaluation/text_eval_script.py", line 472, in text_eval_main return rrc_evaluation_funcs.main_evaluation(None,det_file, gt_file, default_evaluation_params,validate_data,evaluate_method) File "/home/CN/zizhang.wu/zzr/AdelaiDet/adet/evaluation/rrc_evaluation_funcs.py", line 414, in main_evaluation validate_data_fn(p['g'], p['s'], evalParams) File "/home/CN/zizhang.wu/zzr/AdelaiDet/adet/evaluation/text_eval_script.py", line 59, in validate_data raise Exception("The sample %s not present in GT" %k) Exception: The sample 0002874 not present in GT I saw another thread talking about the same problem， but I don't quite understand what he meant by zipping the datasets. Which dataset should I zip then?

Also, there's another question: where could I find the training accuracy? Didn't see that in the log file. Thank you so much！

Yuliang-Liu commented 3 years ago

@xcharxlie We have provided an evaluation_example_script here.

There is not training accuracy in the current version.

aim-uofa / AdelaiDet

Why all ABCNet predictions are the same? #414

TEST:

EVAL_PERIOD: 1000