when will be the train stop？

seasonyang commented 7 years ago

I have trained my own dataset（they are 60 images of 4 classes） as fllow： 1、commd： DATASET_DIR=./tfrecords/voc2007 TRAIN_DIR=./logs/ CHECKPOINT_PATH=./checkpoints/ssd_300_vgg.ckpt python2 train_ssd_network.py \ --train_dir=${TRAIN_DIR} \ --dataset_dir=${DATASET_DIR} \ --dataset_name=pascalvoc_2007 \ --dataset_split_name=train \ --model_name=ssd_300_vgg \ --checkpoint_path=${CHECKPOINT_PATH} \ --save_summaries_secs=60 \ --save_interval_secs=600 \ --weight_decay=0.0005 \ --optimizer=adam \ --learning_rate=0.001 \ --batch_size=16 2、when the loss hovered near 0.6 and setp near 40000，it‘s not stop like this。 INFO:tensorflow:Saving checkpoint to path ./logs/model.ckpt INFO:tensorflow:Recording summary at step 37262. INFO:tensorflow:global step 37270: loss = 0.6264 (1.872 sec/step) INFO:tensorflow:global step 37280: loss = 0.6262 (1.894 sec/step) INFO:tensorflow:global step 37290: loss = 0.6265 (1.807 sec/step) INFO:tensorflow:Recording summary at step 37295. INFO:tensorflow:global step 37300: loss = 0.6261 (1.813 sec/step) INFO:tensorflow:global step 37310: loss = 0.6263 (1.814 sec/step) INFO:tensorflow:global step 37320: loss = 0.6265 (1.814 sec/step)

3、question：the loss has been too low，why it doent convergence and stop trainning？ who can help me？

JDanielWu commented 7 years ago

@seasonyang Only stop when you stop it yourself. Ctrl+C.

seasonyang commented 7 years ago

@WuDanFly Is there some way like set max steps or loss threshold to control the training？

JDanielWu commented 7 years ago

@seasonyang max steps :you can find at train_ssd_network.py. tf.app.flags.DEFINE_integer('max_number_of_steps', None, 'The maximum number of training steps.'). Modify None to the max steps you want..By the way, do you get a good result at your own dataset?

seasonyang commented 7 years ago

@WuDanFly thank you for your help, i think it's my param(match_threshold=0.5) lead the result,my loss(0.6) has not reach to 0.5，so it not stop training。 anyway，I have another question as follow。 when i stop the training with Ctrl+C，i found so many files in ./logs/ like this：

checkpoint
events.out.tfevents.1502780913.TENCENT64.site
events.out.tfevents.1502782292.TENCENT64.site
events.out.tfevents.1502782426.TENCENT64.site events.out.tfevents.1502822085.TENCENT64.site model.ckpt-28153.data-00000-of-00001 model.ckpt-26189.meta model.ckpt-26189.index
model.ckpt-37915.meta
model.ckpt-38242.data-00000-of-00001 model.ckpt-38242.index model.ckpt-28153.index
model.ckpt-38242.meta model.ckpt-28153.meta
model.ckpt-3870.data-00000-of-00001 ... training_config.txt

the question is how can i eval my model? for purpose, my model is fine-tuning with ssd_300_vgg. my eval commd run to a erro result:

DATASET_DIR=./tfrecords/voc2007/ EVAL_DIR=./logs/ CHECKPOINT_PATH=./logs/model.ckpt python eval_ssd_network.py \ --eval_dir=${EVAL_DIR} \ --dataset_dir=${DATASET_DIR} \ --dataset_name=pascalvoc_2007 \ --dataset_split_name=test \ --model_name=model \ --checkpoint_path=${CHECKPOINT_PATH} \ --batch_size=1

when i use "model_name=ssd_300_vgg" or "model_name=model" result is the same!

how can i eval my own model

seasonyang commented 7 years ago

eval result at ssd_300_vgg: 2017-08-16 19:09:31.452408: I tensorflow/core/kernels/logging_ops.cc:79] AP_VOC07/mAP[0] 2017-08-16 19:09:31.452709: I tensorflow/core/kernels/logging_ops.cc:79] AP_VOC12/mAP[0] INFO:tensorflow:Finished evaluation at 2017-08-16-11:09:31 Time spent : 427.356 seconds. Time spent per BATCH: 0.086 seconds.

JDanielWu commented 7 years ago

@seasonyang The function of match_threshold is not to stop the train.You can read #71 .About eval ,I suggest you read README.md, it helps me a lot ,should be same to you.

hallochen commented 6 years ago

@seasonyang i meet the same confusion,don't know where is the model by training .During the training ,the ssd_300_vgg is not changed .

RogerAylagas commented 6 years ago

@seasonyang can you share the training code? you got a really low loss!

VolleyballBird commented 5 years ago

can't get ssd_300_vgg file when running "train_ssd_network.py --detaset_dir=*/tfrecords",and there occured some error,but when add arguments which are in the finetuning's,it can run ,but cant get ssd_300_vgg,but get some other files in logs/ what shall i do to start the train?

balancap / SSD-Tensorflow

when will be the train stop？ #121