MaybeShewill-CV / lanenet-lane-detection

Unofficial implemention of lanenet model for real time lane detection
Apache License 2.0
2.35k stars 885 forks source link

Retraining model on new dataset #74

Closed dscha09 closed 5 years ago

dscha09 commented 5 years ago

i was successful in testing the trained model by getting the trained weights you uploaded in Dropbox. However, I want to retrain the model on new training data.

I added one new image for the existing training data of five images following the instructions in the repo and added new images in the image, gt_image_instance, and gt_image_binary folders, but i get errors. I enter this line from your repo in bash:

python tools/train_lanenet.py --net vgg --dataset_dir data/training_data_example/

The errors I get are:

cv2.error: OpenCV(3.4.2) /Users/travis/build/skvark/opencv-python/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

and sometimes i get this error:

ValueError: Variable lanenet_loss/inference/encode/conv1_1/conv/W already exists

I already modified the train.txt and val.txt and changed the file paths for the images found locally on my machine.

How to fix this?

MaybeShewill-CV commented 5 years ago

@chaine09 I recently update the code. You may update the new code and test the training process again and see if these problems still exist

dscha09 commented 5 years ago

Hi @MaybeShewill-CV this will work even if I only have 5 images as my training data?

MaybeShewill-CV commented 5 years ago

@chaine09 The training process will work batch you have to reset your batch size to some smaller than 5.

dscha09 commented 5 years ago

Hi @MaybeShewill-CV I got this error upon retraining the model

RecursionError: maximum recursion depth exceeded in comparison

MaybeShewill-CV commented 5 years ago

@chaine09 That may be caused by the improper dataset preparation. Could you please show me how you prepare your dataset including your dataset folder structure and your train.txt file

dscha09 commented 5 years ago

This is the contents of my train.txt file

/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/image/0000.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_binary/0000.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_instance/0000.png
/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/image/0001.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_binary/0001.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_instancee/0001.png
/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/image/0002.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_binary/0002.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_instance/0002.png
/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/image/0003.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_binary/0003.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master-retrain/data/training_data_example/gt_image_instance/0003.png

I'm on Mac OS

dscha09 commented 5 years ago

@MaybeShewill-CV How would you generate the images in the gt_image_instance, and gt_image_binary folders?

For now, I just used the existing 5 images you have for training and validation and didn't change the training folder structure. But I modifiedval.txt and train.txt accordingly.

dscha09 commented 5 years ago

@MaybeShewill-CV I used the existing 5 images in your repo. Just tested if I can retrain the model.

MaybeShewill-CV commented 5 years ago

@chaine09 You should change the batch size in config file. Because the batch size is larger than the total amount of your training image. Another way to test is to copy the training examples serval times and change the file names.

dscha09 commented 5 years ago

Hi @MaybeShewill-CV the training batch size is 32 and the test batch size is 4 in the global_config.py file. Which one will I change to 5?

MaybeShewill-CV commented 5 years ago

@chaine09 Did you pull the new code. The train batch size is 8 according to new config file

dscha09 commented 5 years ago

@MaybeShewill-CV Oh yes in the new code both train and test batch sizes are 8. Which batch size should I change to 5? Is it the train or the test batch size?

MaybeShewill-CV commented 5 years ago

@chaine09 Train batch size and the best way you test is to copy the example images several times in my opinion. Good luck:)

dscha09 commented 5 years ago

Hi @MaybeShewill-CV I already changed the train batch size to 5, without changing the number of training images (the original 4 images). But I still get this error:

cv2.error: OpenCV(3.4.2) /Users/travis/build/skvark/opencv-python/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

MaybeShewill-CV commented 5 years ago

@chaine09 First check if the image path is correct. Second like I said before make sure your batch size is smaller than the total amount of your training examples. You can read the data provider code for details. It is quite simple.

dscha09 commented 5 years ago

Hi @MaybeShewill-CV, I checked the three python files in the data_provider folder namely, data_process.py, lanenet_data_processor.py and lanenet_hnet_data_processor.py.

For data_processor.py, I found this line of code:

val = DataSet('/home/baidu/DataBase/Semantic_Segmentation/TUSimple_Lane_Detection/training/train.txt')

Similarly, for lanenet_data_processor.py:

val = DataSet('/home/baidu/DataBase/Semantic_Segmentation/Kitti_Vision/data_road/lanenet_training/train.txt')

And lastly, for lanenet_hnet_data_processor.py:

json_file_list = glob.glob('{:s}/*.json'.format('/media/baidu/Data/Semantic_Segmentation' '/TUSimple_Lane_Detection/training'))

These are all file paths that need to be modified accordingly. They are three different file paths in your local machine, but are referring to the same file name train.txt. Are all of these referring to the train.txt file inside the /data/training_data_example folder?

MaybeShewill-CV commented 5 years ago

@chaine09 The data processor for the model is lanenet_data_processor.py and you can find that I only import that file in my training script. When you train your model which you need to do is just to pass the folder path which includes the train.txt file to the trainner. You can use python tools/train_lanenet.py --help for help.

HanqingXu commented 5 years ago

Hi @MaybeShewill-CV, I wanted to test if the code works for other training dataset(I used Cityscape instance datasets and extract only one class(car in my case) from it), however, it turns out that both the binary_loss and instance_loss reach nan so it is terminated. I investigated it and found out that all the neural layers have reached nan after several steps(so including the embedding layer,decoding layers and etc. ). I checked the nan image and the label ,and they look fine to me. Another interesting phenomenon is that the error always showed up during the validation part of the trainging process(I use tf.Print to check real-time values including the tensors liek the input of decoder, mu, and all losses. It turned out when the error showed up, they are all becoming nan)Have you encountered similar case before? Thanks a lot.

MaybeShewill-CV commented 5 years ago

@HanqingXu Sorry I have not tested that model on other dataset which is set up for other task before==!

dscha09 commented 5 years ago

Hi @MaybeShewill-CV, I already changed this line in lanenet_data_processor.py accordingly:

val = DataSet('/home/baidu/DataBase/Semantic_Segmentation/Kitti_Vision/data_road/lanenet_training/train.txt')

Then I already changed the train batch_size to 2 and test batch_size to 1, since the number of training data I have is 4 while only 1 for validation (validation in this case is same as test? or did you split the 4 images further?).

However, I still get this error:

RecursionError: maximum recursion depth exceeded in comparison

Did I do it correctly?

MaybeShewill-CV commented 5 years ago

@chaine09 You did not change it in a right way. Do not change the lanenet_data_processor.py file only change the batch size in the config.py file and make sure the batch size is smaller than the total amount of your training examples.(By the way the total amount of your training examples is equal to the number of the lines in your train.txt file):)

dscha09 commented 5 years ago

@MaybeShewill-CV should I not change val in lanenet_data_processor.py?

from

val = DataSet('/home/baidu/DataBase/Semantic_Segmentation/Kitti_Vision/data_road/lanenet_training/train.txt')

to

val = DataSet('/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/train.txt')

Should I not do this?

dscha09 commented 5 years ago

Then here are the contents of my train.txt:

/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/image/0000.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_binary/0000.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_instance/0000.png
/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/image/0001.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_binary/0001.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_instancee/0001.png
/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/image/0002.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_binary/0002.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_instance/0002.png
/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/image/0003.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_binary/0003.png /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data/training_data_example/gt_image_instance/0003.png

which I modified.

MaybeShewill-CV commented 5 years ago

@chaine09 You are not supposed to change the code in lanenet_data_processor.py file

dscha09 commented 5 years ago

@MaybeShewill-CV what about train.txt? So I have 4 training examples in this case?

MaybeShewill-CV commented 5 years ago

@chaine09 The train.txt seems to be correct, make sure the three file path in the same line are seperated by blank. Adjust your batch size to 2 or 3 then you can start training the model.

dscha09 commented 5 years ago

@MaybeShewill-CV I also downloaded vgg16.npy and placed it inside the data folder by issuing this command:

wget ftp://mi.eng.cam.ac.uk/pub/mttt2/models/vgg16.npy

Is it correct that the config file you are referring to is the one global_config.py inside the config folder?

Here are the contents of my global_config.py file:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 18-1-31 上午11:21
# @Author  : Luo Yao
# @Site    : http://icode.baidu.com/repos/baidu/personal-code/Luoyao
# @File    : global_config.py
# @IDE: PyCharm Community Edition
"""
设置全局变量
"""
from easydict import EasyDict as edict

__C = edict()
# Consumers can get config by: from config import cfg

cfg = __C

# Train options
__C.TRAIN = edict()

# Set the shadownet training epochs
__C.TRAIN.EPOCHS = 200010
# Set the display step
__C.TRAIN.DISPLAY_STEP = 1
# Set the test display step during training process
__C.TRAIN.TEST_DISPLAY_STEP = 1000
# Set the momentum parameter of the optimizer
__C.TRAIN.MOMENTUM = 0.9
# Set the initial learning rate
__C.TRAIN.LEARNING_RATE = 0.0005
# Set the GPU resource used during training process
__C.TRAIN.GPU_MEMORY_FRACTION = 0.85
# Set the GPU allow growth parameter during tensorflow training process
__C.TRAIN.TF_ALLOW_GROWTH = True
# Set the shadownet training batch size
__C.TRAIN.BATCH_SIZE = 2 # changed 8 to 2

# Set the shadownet validation batch size
__C.TRAIN.VAL_BATCH_SIZE = 8
# Set the learning rate decay steps
__C.TRAIN.LR_DECAY_STEPS = 410000
# Set the learning rate decay rate
__C.TRAIN.LR_DECAY_RATE = 0.1
# Set the class numbers
__C.TRAIN.CLASSES_NUMS = 2
# Set the image height
__C.TRAIN.IMG_HEIGHT = 256
# Set the image width
__C.TRAIN.IMG_WIDTH = 512

# Test options
__C.TEST = edict()

# Set the GPU resource used during testing process
__C.TEST.GPU_MEMORY_FRACTION = 0.8
# Set the GPU allow growth parameter during tensorflow testing process
__C.TEST.TF_ALLOW_GROWTH = True
# Set the test batch size
__C.TEST.BATCH_SIZE = 1
dscha09 commented 5 years ago

Here is the error I'm getting:

File "/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data_provider/lanenet_data_processor.py", line 93, in next_batch self._random_dataset() File "/Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/data_provider/lanenet_data_processor.py", line 66, in _random_dataset random_idx = np.random.permutation(len(self._gt_img_list)) File "mtrand.pyx", line 4907, in mtrand.RandomState.permutation File "mtrand.pyx", line 4824, in mtrand.RandomState.shuffle File "/Users/cvsanbuenaventura/miniconda3/envs/tensorflow_orig/lib/python3.5/site-packages/numpy/core/_internal.py", line 254, in init if self._arr.ndim == 0: RecursionError: maximum recursion depth exceeded in comparison

MaybeShewill-CV commented 5 years ago

@chaine09 Add a break point on train script see if the number of batch size was correctly passed.

dscha09 commented 5 years ago

@MaybeShewill-CV How do I add a breakpoint and what do you mean by "train script"?

MaybeShewill-CV commented 5 years ago

@chaine09 About breakpoint you can google how to use a IDE to debug. Train script means the train_lanenet.py file

dscha09 commented 5 years ago

@MaybeShewill-CV Is it not the value for the training batch size is specified in the global_config.py file?

You mean I need to run and debug the train_lanenet.py script? Line by line? How would I know if the number of batch size was correctly passed?

MaybeShewill-CV commented 5 years ago

@chaine09 Since I do not know how you use the model you need to check if the batch size parmas value is correct according to the debugger.

dscha09 commented 5 years ago

2018-10-31 23:12:10.135633: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA I1031 23:12:12.052288 16525 train_lanenet.py:163] Global configuration is as follows: I1031 23:12:12.053390 16525 train_lanenet.py:164] {'TEST': {'TF_ALLOW_GROWTH': True, 'GPU_MEMORY_FRACTION': 0.8, 'BATCH_SIZE': 1}, 'TRAIN': {'TEST_DISPLAY_STEP': 1000, 'CLASSES_NUMS': 2, 'EPOCHS': 200010, 'VAL_BATCH_SIZE': 8, 'LR_DECAY_STEPS': 410000, 'IMG_WIDTH': 512, 'DISPLAY_STEP': 1, 'GPU_MEMORY_FRACTION': 0.85, 'LEARNING_RATE': 0.0005, 'MOMENTUM': 0.9, 'LR_DECAY_RATE': 0.1, 'IMG_HEIGHT': 256, 'TF_ALLOW_GROWTH': True, 'BATCH_SIZE': 2}}

From this, I think the train batch size passed is 2.

MaybeShewill-CV commented 5 years ago

@chaine09 It seems you have the right batch size. Then I have no idea about this. Maybe you should recheck your training procedure or you can test data provider alone. The implementation of data provider is quite simple I think you can figure it out by yourself. Everything you said was tested correctly yesterday by myself:)

dscha09 commented 5 years ago

Hi @MaybeShewill-CV Now I'm getting an error with testing the model, which I did not get from your previous code.

I used this:

# from /lanenet-lane-detection-master
python tools/test_lanenet.py --is_batch False --batch_size 1 \
--weights_path /Users/cvsanbuenavts/lanenet-lane-detection-master/model/tusimple_lanenet/tusimple_lanenet_vgg_2018-10-19-13-33-56.ckpt-200000.data-00000-of-00001 \
--image_path data/tusimple_test_image/0.jpg

Then I get this error:

DataLossError (see above for traceback): Unable to open table file /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/model/tusimple_lanenet/tusimple_lanenet_vgg_2018-10-19-13-33-56.ckpt-200000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Does this has something to do with the weights_path I specified? I copied the saved model from https://www.dropbox.com/sh/tnsf0lw6psszvy4/AAA81r53jpUI3wLsRW6TiPCya?dl=0 and put it inside /model/tusimple_lanenet.

When I used your old code, I was able to successfully generate the 3 output images.

MaybeShewill-CV commented 5 years ago

@chaine09 It seems that you are not familiar with tensorflow==! The weights_path should be --weights_path /Users/cvsanbuenavts/lanenet-lane-detection-master/model/tusimple_lanenet/tusimple_lanenet_vgg_2018-10-19-13-33-56.ckpt-200000

dscha09 commented 5 years ago

Hello @MaybeShewill-CV! the last issue I asked about, about testing the model, please disregard it. It was caused by a problem in bash. Sometimes when I edit the commands in bash or paste commands, some parts of it would overlap or be pasted twice. I already fixed it. Thanks! :)

For the training, I have to double check the entire process again.

MaybeShewill-CV commented 5 years ago

@chaine09 Yep you'd better check the process again. I have tested it several times on my computer nothing wrong happened:)

dscha09 commented 5 years ago

Hi @MaybeShewill-CV, I did what you suggested before, which is to copy multiple copies of the 5 train images for the image, gt_image_instance, and gt_image_binary. I also updated the train.txt file and added the copies. I have a total of 18 images for training. And still 1 image for validation.

I entered this in bash inside the /lanenet-net-detection-master folder:

python tools/train_lanenet.py --net vgg --dataset_dir data/training_data_example/

Then here are the details of the training:

2018-11-02 15:27:49.233191: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA I1102 15:27:49.842538 2106 train_lanenet.py:163] Global configuration is as follows: I1102 15:27:49.843230 2106 train_lanenet.py:164] {'TRAIN': {'TEST_DISPLAY_STEP': 1000, 'EPOCHS': 200010, 'IMG_WIDTH': 512, 'IMG_HEIGHT': 256, 'LEARNING_RATE': 0.0005, 'DISPLAY_STEP': 1, 'LR_DECAY_RATE': 0.1, 'CLASSES_NUMS': 2, 'GPU_MEMORY_FRACTION': 0.85, 'VAL_BATCH_SIZE': 1, 'TF_ALLOW_GROWTH': True, 'BATCH_SIZE': 2, 'LR_DECAY_STEPS': 410000, 'MOMENTUM': 0.9}, 'TEST': {'TF_ALLOW_GROWTH': True, 'BATCH_SIZE': 1, 'GPU_MEMORY_FRACTION': 0.8}} I1102 15:27:52.105909 2106 train_lanenet.py:172] Training from scratch

But after this, i get a new error:

ValueError: Cannot feed value of shape (1, 256, 512, 3) for Tensor 'input_tensor:0', which has shape '(2, 256, 512, 3)'

Any ideas on how to solve this?

MaybeShewill-CV commented 5 years ago

@chaine09 Your shape of your input tensor is (2, 256, 512, 3) but you feed it with a tensor with shape (1, 256, 512, 3). You must have changed my code so the input pip line got stuck. Next time hope you paste your code here:)

dscha09 commented 5 years ago

@MaybeShewill-CV no I haven't changed your code. Yeah from the looks of it the input tensor is receiving a tensor with a different shape other than the on specified.

Oh, I just realized that the input tensor shape should be (2, 256, 512, 3). Why? Is it not (1, 256, 512, 3)?

Am i correct that the shape of the image is 256 x 512? And 3 is because of the RGB component of the image right?

dscha09 commented 5 years ago

@MaybeShewill-CV Hmm, the tensor you feed to input_tensor is gt_imgs right? I checked the shape of gt_imgs and its shape is really (2, 256, 512, 3).

gt_imgs, binary_gt_labels, instance_gt_labels = train_dataset.next_batch(CFG.TRAIN.BATCH_SIZE)

gt_imgs = [cv2.resize(tmp,
                     dsize=(CFG.TRAIN.IMG_WIDTH, CFG.TRAIN.IMG_HEIGHT),
                     dst=tmp,
                     interpolation=cv2.INTER_LINEAR)
                   for tmp in gt_imgs]

gt_imgs = [tmp - VGG_MEAN for tmp in gt_imgs]

You defined gt_imgs thrice in train_lanenet.py, and for each of these, the shape is (2, 256, 512, 3). I used np.array(gt_imgs).shape.

MaybeShewill-CV commented 5 years ago

@chaine09 Make sure the feed value and the placeholder has the same shape:)

dscha09 commented 5 years ago

Hi @MaybeShewill-CV, I'm not training on GPU just CPU. I found this line of code. Should I change this?

with tf.device('/gpu:1'):

What does this do?

MaybeShewill-CV commented 5 years ago

@chaine09 If you want to use cpu instead then you are supposed to change it to ('/cpu:0')

dscha09 commented 5 years ago

Hi @MaybeShewill-CV Since I'm using CPU only for training, I made the following changes as you have suggested:

I changed

with tf.device('/gpu:1'):

to

with tf.device('/cpu:0'):

Then I found this block of code:

sess_config = tf.ConfigProto(allow_soft_placement=True)
sess_config.gpu_options.per_process_gpu_memory_fraction = CFG.TRAIN.GPU_MEMORY_FRACTION
sess_config.gpu_options.allow_growth = CFG.TRAIN.TF_ALLOW_GROWTH
sess_config.gpu_options.allocator_type = 'BFC'

sess = tf.Session(config=sess_config)

So I changed the last line to just

sess = tf.Session()

And retrained the model.. however, I'm still getting this error:

ValueError: Cannot feed value of shape (1, 256, 512, 3) for Tensor 'input_tensor:0', which has shape '(2, 256, 512, 3)'

So I was thinking of 3 probable sources of this error:

  1. Incorrect image type or structure in the 3 folders image, gt_image_instance, and gt_image_binary

    However, I just copied the 5 existing files and replicated it with different file names (up to img0017). Then I also updated the train.txt file by adding the copies.

  2. Need to modify some parts of the global_config.py file found in /config

    So far, I only modified the train batch size to 2 and test batch size to 1. Should I still modify other parts of this file?

  3. Need to change the path in lanenet_data_processor.py to point to the train.txt file in my local computer.

    But you told me to not change this file.

  4. Need to modify train_lanenet.py code?

    But i have re-downloaded your updated code, and you told me that you have tried to retrain it on your computer with no errors.

  5. Your code is for Tensorflow with GPU?

    But I already modified the train_lanenet.py code for this. Did I do it correctly? However the error is about the incorrect shape of the input_tensor.

Hmmm... which do you think is the most probable source of the error? Thanks in advance :)

MaybeShewill-CV commented 5 years ago

@chaine09 There is only one placeholder as input tensor so the test batch size and the train batch size must be the same:)

dscha09 commented 5 years ago

@MaybeShewill-CV Oh.. I changed the train and test batch size to 2 just now. The val batch size is still 1.

2018-11-02 19:09:49.783596: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA I1102 19:09:50.384888 2509 train_lanenet.py:162] Global configuration is as follows: I1102 19:09:50.385061 2509 train_lanenet.py:163] {'TEST': {'TF_ALLOW_GROWTH': True, 'GPU_MEMORY_FRACTION': 0.8, 'BATCH_SIZE': 2}, 'TRAIN': {'DISPLAY_STEP': 1, 'LEARNING_RATE': 0.0005, 'LR_DECAY_RATE': 0.1, 'IMG_HEIGHT': 256, 'MOMENTUM': 0.9, 'IMG_WIDTH': 512, 'GPU_MEMORY_FRACTION': 0.85, 'VAL_BATCH_SIZE': 1, 'LR_DECAY_STEPS': 410000, 'TEST_DISPLAY_STEP': 1000, 'EPOCHS': 200010, 'CLASSES_NUMS': 2, 'TF_ALLOW_GROWTH': True, 'BATCH_SIZE': 2}}

Still the same error tho.

MaybeShewill-CV commented 5 years ago

@chaine09 Seems that you even read nothing about the code. The cfg.VAL.BATCH_SIZE is used in the code. Please read it first. The question you ask about is too easy for you to correct.