Some problems when I train the model

ustczhouyu commented 5 years ago

When I use ICDAR2015 to train the model, Inside the file sample_train_data/MLT/trainMLT.txt are icdar2015 localization training images such as icdar-2015-Ch4/Train/img_1.jpg and inside sample_train_data/MLT_CROPS/gt.txt are icdar2015 recognition training images such as word_1.png, "Genaxis Theatre". I have not changed other paths. When I train the model by: python3 train.py -train_list=sample_train_data/MLT/trainMLT.txt -batch_size=8 -num_readers=5 -debug=0 -input_size=512 -ocr_batch_size=256 -ocr_feed_list=sample_train_data/MLT_CROPS/gt.txt the output are: root@10ca3ad2a7d1:/home/zy/jupyter/recognition/spotter/E2E-MLT-master# python3 train.py -train_list=sample_train_data/MLT/trainMLT.txt -batch_size=8 -num_readers=5 -debug=0 -input_size=512 -ocr_batch_size=256 -ocr_feed_list=sample_train_data/MLT_CROPS/gt.txt Using E2E-MLT loading model from e2e-mlt.h5 e2e-mlt.h5 1000 training images in sample_train_data/MLT/trainMLT.txt 1000 training images in sample_train_data/MLT/trainMLT.txt 1000 training images in sample_train_data/MLT/trainMLT.txt 1000 training images in sample_train_data/MLT/trainMLT.txt 1000 training images in sample_train_data/MLT/trainMLT.txt 4468 training images in sample_train_data/MLT_CROPS/gt.txt 4468 training images in sample_train_data/MLT_CROPS/gt.txt I waited for half an hour, but no more output. can you help me? thank you.

MichalBusta commented 5 years ago

Hi, looks like problem with data feeding.

you can try: use -debug=1 flag to see the training data

there is piece of bad code in data_gen.py:

if not os.path.exists(im_name): continue im = cv2.imread(im_name) if im is None: continue

MiZhangWhuer commented 5 years ago

Hi, looks like problem with data feeding.

you can try: use -debug=1 flag to see the training data

there is piece of bad code in data_gen.py:

if not os.path.exists(im_name): continue im = cv2.imread(im_name) if im is None: continue

Hi,@MichalBusta @ustczhouyu I meet the same issues as you've asked. And I solve the problem by commenting the following lines associated with dg_ocr:

  # imageso, labels, label_length = next(dg_ocr)
  # im_data_ocr = net_utils.np_to_variable(imageso, is_cuda=opts.cuda).permute(0, 3, 1, 2)
  # features = net.forward_features(im_data_ocr)
  # labels_pred = net.forward_ocr(features)
  #
  # probs_sizes =  torch.IntTensor( [(labels_pred.permute(2,0,1).size()[0])] * (labels_pred.permute(2,0,1).size()[1]) )
  # label_sizes = torch.IntTensor( torch.from_numpy(np.array(label_length)).int() )
  # labels = torch.IntTensor( torch.from_numpy(np.array(labels)).int() )
  # loss_ocr = ctc_loss(labels_pred.permute(2,0,1), labels, probs_sizes, label_sizes) / im_data_ocr.size(0) * 0.5
  #
  # loss_ocr.backward()

I think the main reason is that two threads 'dg_ocr' and 'data_generator' conflicts with each other in each training epoch. @MichalBusta do you have any other approach to solve this problem?

ustczhouyu commented 5 years ago

Hi, nice to know that you have synthesized a multilingual data set Synthetic Multi-Language in Natural Scene Dataset, I don't know how to download it, can you send it to me? Thank you very much.

At 2018-11-28 21:35:55, "Michal Busta" notifications@github.com wrote:

Hi, looks like problem with data feeding.

you can try: use -debug=1 flag to see the training data

there is piece of bad code in data_gen.py:

if not os.path.exists(im_name): continue im = cv2.imread(im_name) if im is None: continue

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

MichalBusta commented 5 years ago

Hi, nice to know that you have synthesized a multilingual data set Synthetic Multi-Language in Natural Scene Dataset, I don't know how to download it, can you send it to me? Thank you very much.

https://github.com/MichalBusta/E2E-MLT -section Data

ycjcy commented 5 years ago

@ustczhouyu @MiZhangWhuer @MichalBusta Hi, I meet the same question, and I changed according to the above.But the error still occur,hope you give me some solution.Look forward to your reply.Thank you.

LittlePinkRobin commented 5 years ago

@ycjcy @MichalBusta If you are running the sample data that is provided in the repository try making batchsize=2 as I noticed it was an issue with batchsize=8 it would never hit the terminating case.

duxiangcheng commented 4 years ago

@MichalBusta @MiZhangWhuer @ycjcy @LittlePinkRobin @ustczhouyu hello everyone！ I want to know the function of "-ocr_feed_list" in the train.py? And where can I get the cropped image? Thanks

MichalBusta / E2E-MLT

Some problems when I train the model #9