balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow
4.11k stars 1.89k forks source link

eval #264

Open xpandi-top opened 6 years ago

xpandi-top commented 6 years ago

when run eval_ssd_network.py, meets this 2018-07-24 18:48:02.126672: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:233] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node average_precision_voc07/ArithmeticOptimizer/HoistCommonFactor_Add_AddN is missing output properties at position :0 (num_outputs=0) AP_VOC07/mAP[0.20843321784099034] AP_VOC12/mAP[0.20235189944609927]

foamliu commented 6 years ago

+1

prachiAeromana commented 6 years ago

I started getting this message after upgrading to tensorflow 1.10.0 (from 1.8.0). However, my custom tensorflow code still runs.

HongyiDuanmu26 commented 6 years ago

same problem here

ryohachiuma commented 6 years ago

same here at tensorflow 1.8.0

XuDuoBiao commented 6 years ago

so,how can we do to solve this problem?

ZhuDaQing commented 6 years ago

me too, have you solved this problem?

kisanzxy commented 5 years ago

Same issue when running the evaluation script, and the mAP is extremely small

Leon924 commented 5 years ago

who can help us please? anybody solve this?

Sulince commented 5 years ago

i got the same problem ,just like the result you had got . Have you solve it yet??? @xpandi-top @foamliu @prachiAeromana @HongyiDuanmu26 @kemangjaka Anyone solve it ??? Need your help. Thanks a lot!!!

ryohachiuma commented 5 years ago

Hi, I'm using Ubuntu 16.04 and tensorflow-gpu 1.10.0 now, and I couldn't reproduce the error. The evaluation worked fine. When I got the error, I used Windows.

What is your environment?

Sulince commented 5 years ago

Thanks for reply! My environment :Ubuntu 18.04 + tensorflow-gpu 1.12.0 + python3.6 I changed my eval_ssd_network.py file and metrics.py file following the #321 ,and run eval_ssd_network.py successfully, but the result like this:

2019-03-07 09:54:28.724070: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:237] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node average_precision_voc07/ArithmeticOptimizer/HoistCommonFactor_Add_AddN is missing output properties at position :0 (num_outputs=0) AP_VOC07/mAP[3.1303382901083712e-05] AP_VOC12/mAP[1.4904059586232956e-05]

Help me please! Maybe you can give me your eval_ssd_network.py file and metrics.py file if you don't mind. My email: 15215420373@163.com Thanks a lot!! @kemangjaka

ryohachiuma commented 5 years ago

ok. Correct me. I also got the same warning as you. But I didn't get such a low mAP. What kind of dataset do you use for the evaluation? I think it's not a problem of optimizer error.

Sulince commented 5 years ago

The dataset used is VOCtest_06-Nov-2007, and the model is VGG_VOC0712_SSD_300x300_iter_120000.ckpt What is your mAP??? Are the dataset and model he same with me ? @kemangjaka

ryohachiuma commented 5 years ago

I downloaded VOCtest_06-Nov-2007 dataset, and evaluated with the VGG_VOC0712_SSD_300x300_iter_120000.ckpt model.

So, the command I typed is the following.

python eval_ssd_network.py --eval_dir=./log_2007/ --dataset_dir=./data/ --dataset_name=pascalvoc_2007 --dataset_split_name=test --model_name=ssd_300_vgg --checkpoint_path=./checkpoints/VGG_VOC0712_SSD_300x300_iter_120000.ckpt --batch_size=1

And I got the mAP below.

AP_VOC07/mAP[0.59928033284390148]
AP_VOC12/mAP[0.60921384902021813]

Still quite low but not too low I think.

BTW, I didn't do any modifications to metrics.py

Sulince commented 5 years ago

The command i use are the same ,so as the dataset and the model. And the env is not a problem , could you please send your eval_ssd_network.py. file to my email so that i can have a try? @kemangjaka

ryohachiuma commented 5 years ago

Well, I only changed flatten part, and nothing changed from the original file. https://github.com/balancap/SSD-Tensorflow/issues/321#issuecomment-469188867

Could you try with tensorflow-gpu 1.10.0?

Leon924 commented 5 years ago

@kemangjaka @Sulince hi,have you solve the problem? I got that probelm too, and I cannot figure it out for so long. 019-03-08 22:41:45.604947: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:237] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node average_precision_voc07/ArithmeticOptimizer/HoistCommonFactor_Add_AddN is missing output properties at position :0 (num_outputs=0) AP_VOC07/mAP[0.00010226225830017356] AP_VOC12/mAP[2.127145489434078e-05]

ryohachiuma commented 5 years ago

Hi, could you tell me the version of python, tensorflow, OS, and the command you typed? And also, did you modify any code from the original one?

Leon924 commented 5 years ago

@kemangjaka just like you said, I only add flatten function, and my env is: tf 1.10-gpu, python3.6, redhat4.8.5 I think my env is ok ,because I can run the tutorial example

and this is my code:

DATASET_DIR=/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/

EVAL_DIR=/export/userhome/liqiang/liqiang/Deeplearning/SSD/log_files/log_VOC2007/log_eval/

CHECKPOINT_PATH=/export/userhome/liqiang/liqiang/Deeplearning/SSD/ckpt/SSD_ckpt/VGG_VOC0712_SSD_300x300_iter_120000.ckpt/

CUDA_VISIBLE_DEVICES=3 python /export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-Tensorflow/eval_ssd_network.py \ --eval_dir=${EVAL_DIR} \ --dataset_dir=${DATASET_DIR} \ --dataset_name=pascalvoc_2007 \ --dataset_split_name=test \ --model_name=ssd_300_vgg \ --checkpoint_path=${CHECKPOINT_PATH} \ --batch_size=1 \

ryohachiuma commented 5 years ago

@petit-ami Could you post all of your output?

The command is exactly the same as mine. I don't know how to reproduce your results...

Leon924 commented 5 years ago

@kemangjaka here .please WARNING:tensorflow:From /export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-Tensorflow/eval_ssd_network.py:113: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step

=========================================================================== #

SSD net parameters:

===========================================================================

{'anchor_offset': 0.5, 'anchor_ratios': [[2, 0.5], [2, 0.5, 3, 0.3333333333333333], [2, 0.5, 3, 0.3333333333333333], [2, 0.5, 3, 0.3333333333333333], [2, 0.5], [2, 0.5]], 'anchor_size_bounds': [0.15, 0.9], 'anchor_sizes': [(21.0, 45.0), (45.0, 99.0), (99.0, 153.0), (153.0, 207.0), (207.0, 261.0), (261.0, 315.0)], 'anchor_steps': [8, 16, 32, 64, 100, 300], 'feat_layers': ['block4', 'block7', 'block8', 'block9', 'block10', 'block11'], 'feat_shapes': [(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)], 'img_shape': (300, 300), 'no_annotation_label': 21, 'normalizations': [20, -1, -1, -1, -1, -1], 'num_classes': 21, 'prior_scaling': [0.1, 0.1, 0.2, 0.2]}

===========================================================================

Training | Evaluation dataset files:

===========================================================================

['/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_000.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_001.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_002.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_003.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_004.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_005.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_006.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_007.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_008.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_009.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_010.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_011.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_012.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_013.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_014.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_015.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_016.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_017.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_018.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_019.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_020.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_021.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_022.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_023.tfrecord', '/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2007/VOCtest_06-Nov-2007/VOCdevkit/VOC2007_tfrecord/voc_2007_test_024.tfrecord']

WARNING:tensorflow:From /export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-Tensorflow/eval_ssd_network.py:226: streaming_mean (from tensorflow.contrib.metrics.python.ops.metric_ops) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.metrics.mean INFO:tensorflow:Evaluating None INFO:tensorflow:Starting evaluation at 2019-03-08-14:29:14 INFO:tensorflow:Graph was finalized. 2019-03-08 22:29:14.520042: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-03-08 22:29:14.971772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531 pciBusID: 0000:84:00.0 totalMemory: 11.90GiB freeMemory: 4.26GiB 2019-03-08 22:29:14.971931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-08 22:29:21.058607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-08 22:29:21.058689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-08 22:29:21.058710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-08 22:29:21.072137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1218 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:84:00.0, compute capability: 6.1) INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. 2019-03-08 22:29:45.913166: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 22:29:46.126467: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 22:29:46.147991: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 22:29:46.211770: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.37GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 22:29:46.239826: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.18GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 22:29:46.243550: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.35GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 22:29:46.333402: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. INFO:tensorflow:Evaluation [495/4952] INFO:tensorflow:Evaluation [990/4952] INFO:tensorflow:Evaluation [1485/4952] INFO:tensorflow:Evaluation [1980/4952] INFO:tensorflow:Evaluation [2475/4952] INFO:tensorflow:Evaluation [2970/4952] INFO:tensorflow:Evaluation [3465/4952] INFO:tensorflow:Evaluation [3960/4952] INFO:tensorflow:Evaluation [4455/4952] INFO:tensorflow:Evaluation [4950/4952] INFO:tensorflow:Evaluation [4952/4952] 2019-03-08 22:41:45.604947: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:237] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node average_precision_voc07/ArithmeticOptimizer/HoistCommonFactor_Add_AddN is missing output properties at position :0 (num_outputs=0) AP_VOC07/mAP[0.00010226225830017356] AP_VOC12/mAP[2.127145489434078e-05] INFO:tensorflow:Finished evaluation at 2019-03-08-14:43:15 Time spent : 841.545 seconds. Time spent per BATCH: 0.170 seconds.

ryohachiuma commented 5 years ago

I found it. In your log, it says,

INFO:tensorflow:Evaluating None

That means you cannot load trained file properly. Your evaluation is conducted with the initialized random weights network. The path of the checkpoint is correct?

@Sulince maybe your problem is exactly the same as this one.

Leon924 commented 5 years ago

ARNING:tensorflow:From /export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-Tensorflow/eval_ssd_network.py:226: streaming_mean (from tensorflow.contrib.metrics.python.ops.metric_ops) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.metrics.mean INFO:tensorflow:Evaluating /export/userhome/liqiang/liqiang/Deeplearning/SSD/ckpt/SSD_ckpt/VGG_VOC0712_SSD_300x300_iter_120000.ckpt/VGG_VOC0712_SSD_300x300_iter_120000.ckpt INFO:tensorflow:Starting evaluation at 2019-03-08-15:25:58 INFO:tensorflow:Graph was finalized. 2019-03-08 23:25:58.450231: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-03-08 23:25:58.889497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531 pciBusID: 0000:84:00.0 totalMemory: 11.90GiB freeMemory: 4.26GiB 2019-03-08 23:25:58.889608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-08 23:26:13.981284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-08 23:26:13.981353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-08 23:26:13.981373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-08 23:26:14.010918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1218 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:84:00.0, compute capability: 6.1) INFO:tensorflow:Restoring parameters from /export/userhome/liqiang/liqiang/Deeplearning/SSD/ckpt/SSD_ckpt/VGG_VOC0712_SSD_300x300_iter_120000.ckpt/VGG_VOC0712_SSD_300x300_iter_120000.ckpt INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. 2019-03-08 23:26:38.264869: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.448399: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.469665: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.472249: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.519175: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.37GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.572346: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.18GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.575961: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.35GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.630726: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.06GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-08 23:26:38.652513: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. INFO:tensorflow:Evaluation [495/4952] INFO:tensorflow:Evaluation [990/4952] INFO:tensorflow:Evaluation [1485/4952] INFO:tensorflow:Evaluation [1980/4952] INFO:tensorflow:Evaluation [2475/4952] INFO:tensorflow:Evaluation [2970/4952] INFO:tensorflow:Evaluation [3465/4952] INFO:tensorflow:Evaluation [3960/4952] INFO:tensorflow:Evaluation [4455/4952] INFO:tensorflow:Evaluation [4950/4952] INFO:tensorflow:Evaluation [4952/4952] 2019-03-08 23:34:59.828743: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:237] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node average_precision_voc07/ArithmeticOptimizer/HoistCommonFactor_Add_AddN is missing output properties at position :0 (num_outputs=0) AP_VOC07/mAP[0.59928033284390148] AP_VOC12/mAP[0.60921384904878606] INFO:tensorflow:Finished evaluation at 2019-03-08-15:35:12 Time spent : 554.773 seconds. Time spent per BATCH: 0.112 seconds.

Thanks you!!!! that is , I need to plus another .ckpt. Thanks so much

Sulince commented 5 years ago

I have solve this problem, the solution just as kemangjaka said ''INFO:tensorflow:Evaluating None", just change another model file ,It will be work. I think the model named VGG_VOC0712_SSD_300x300_iter_120000.ckpt in the repository has something wrong, so do not use it and find another one. @kemangjaka @petit-ami

Sulince commented 5 years ago

By the way , have you train the model successfully? when i train the model in VOC07+12 dataset ,my loss is high and shake as follows: INFO:tensorflow:Recording summary at step 62230. INFO:tensorflow:global step 62240: loss = 40.2912 (0.496 sec/step) INFO:tensorflow:global step 62250: loss = 40.6664 (0.493 sec/step) INFO:tensorflow:global step 62260: loss = 40.5154 (0.502 sec/step) INFO:tensorflow:global step 62270: loss = 23.9944 (0.487 sec/step) INFO:tensorflow:global step 62280: loss = 21.0998 (0.501 sec/step) INFO:tensorflow:global step 62290: loss = 39.5273 (0.505 sec/step) INFO:tensorflow:global step 62300: loss = 28.9741 (0.522 sec/step) INFO:tensorflow:global step 62310: loss = 33.9893 (0.504 sec/step) INFO:tensorflow:global step 62320: loss = 31.2430 (0.517 sec/step) INFO:tensorflow:global step 62330: loss = 50.1789 (0.500 sec/step) INFO:tensorflow:global step 62340: loss = 16.4918 (0.493 sec/step)

here is my paras: DATASET_DIR=/home/sulince/SSD_tensorflow/VOC0713/tfrecords/ TRAIN_DIR=/home/sulince/SSD_tensorflow/train_model/ CHECKPOINT_PATH=/home/sulince/SSD_tensorflow/checkpoints/vgg_16.ckpt

python3 /home/sulince/SSD_tensorflow/train_ssd_network.py \ --train_dir=${TRAIN_DIR} \ --dataset_dir=${DATASET_DIR} \ --dataset_name=pascalvoc_2007 \ --dataset_split_name=train \ --model_name=ssd_300_vgg \ --checkpoint_path=${CHECKPOINT_PATH} \ --checkpoint_model_scope=vgg_16 \ --checkpoint_exclude_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \ --trainable_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \ --save_summaries_secs=60 \ --save_interval_secs=600 \ --weight_decay=0.0005 \ --optimizer=adam \ --learning_rate=0.001 \ --learning_rate_decay_factor=0.94 \ --batch_size=16 \ @--gpu_memory_fraction=0.9 what is your loss?? @kemangjaka @petit-ami

Leon924 commented 5 years ago

@Sulince I was using VGG_VOC0712_SSD_300x300_iter_120000.ckpt yesterday, it worked.

AP_VOC07/mAP[0.59928033284390148] AP_VOC12/mAP[0.60921384904878606]

and today I run another one, named VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt it also worked but have higher mAP @kemangjaka

AP_VOC07/mAP[0.74313215403145927] AP_VOC12/mAP[0.76659716498723329]

I am fine-tuning existing SSD checkpoint VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt , but loss connot converge, shaking around 100. DATASET_DIR=/export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-datasets/VOC2012/VOCtrainval_11-May-2012/VOCdevkit/VOC2012_tfrecord/

TRAIN_DIR=/export/userhome/liqiang/liqiang/Deeplearning/SSD/log_files/log_finetune_2012/ CHECKPOINT_PATH=/export/userhome/liqiang/liqiang/Deeplearning/SSD/log_files/log_finetune_2012/model.ckpt-40000 CUDA_VISIBLE_DEVICES=2 python /export/userhome/liqiang/liqiang/Deeplearning/SSD/SSD-Tensorflow/train_ssd_network.py \ --train_dir=${TRAIN_DIR} \ --dataset_dir=${DATASET_DIR} \ --dataset_name=pascalvoc_2012 \ --dataset_split_name=train \ --model_name=ssd_300_vgg \ --CHECKPOINT_PATH=${CHECKPOINT_PATH} \ --save_summaries_secs=60 \ --save_interval_secs=600 \ --weight_deacy=0.05 \ --optimizer=adam \ --learning_rate=0.00000005 \ --batch_size=32 \

@Sulince and I remember last time I trained VGG16, ang also got similiar result as yours. I am working in solving it.

JiangniHIT commented 5 years ago

WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py:187: QueueRunner.init (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py:187: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From eval_ssd_network.py:231: streaming_mean (from tensorflow.contrib.metrics.python.ops.metric_ops) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.metrics.mean INFO:tensorflow:Evaluating ./aug_ckout/model.ckpt-8415 INFO:tensorflow:Starting evaluation at 2019-03-19-05:37:03 INFO:tensorflow:Graph was finalized. 2019-03-19 13:37:04.015866: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-03-19 13:37:04.093802: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-19 13:37:04.094138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 totalMemory: 5.93GiB freeMemory: 5.35GiB 2019-03-19 13:37:04.094152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0 2019-03-19 13:37:04.282793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-19 13:37:04.282824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 2019-03-19 13:37:04.282830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N 2019-03-19 13:37:04.282990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 607 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Restoring parameters from ./aug_ckout/model.ckpt-8415 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py:804: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. 2019-03-19 13:37:07.813959: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 828.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:07.857173: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:07.876847: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 610.31MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:07.909925: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 814.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:07.998780: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 550.42MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:08.018300: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:08.042239: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:08.043062: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:08.102594: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.18GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-03-19 13:37:08.106842: W tensorflow/core/common_runtime/bfc_allocator.cc:215] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.35GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. INFO:tensorflow:Evaluation [195/1952] INFO:tensorflow:Evaluation [390/1952] INFO:tensorflow:Evaluation [585/1952] INFO:tensorflow:Evaluation [780/1952] INFO:tensorflow:Evaluation [975/1952] INFO:tensorflow:Evaluation [1170/1952] INFO:tensorflow:Evaluation [1365/1952] INFO:tensorflow:Evaluation [1560/1952] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1292, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1277, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Reduction axis 0 is empty in shape [0] [[{{node bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/ArgMax}} = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/mul, bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/bboxes_jaccard/transpose_1/Range/start)]] [[{{node ssd_losses/cross_entropy_pos/value/_524}} = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3865_ssd_losses/cross_entropy_pos/value", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "eval_ssd_network.py", line 361, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "eval_ssd_network.py", line 325, in main session_config=config) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 217, in evaluate_once config=session_config) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/evaluation.py", line 212, in _evaluate_once session.run(eval_ops, feed_dict) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run run_metadata=run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1148, in run run_metadata=run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1239, in run raise six.reraise(original_exc_info) File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1224, in run return self._sess.run(args, *kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1296, in run run_metadata=run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1076, in run return self._sess.run(args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 887, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1110, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1286, in _do_run run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1308, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Reduction axis 0 is empty in shape [0] [[{{node bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/ArgMax}} = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/mul, bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/bboxes_jaccard/transpose_1/Range/start)]] [[{{node ssd_losses/cross_entropy_pos/value/_524}} = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3865_ssd_losses/cross_entropy_pos/value", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op 'bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/ArgMax', defined at: File "eval_ssd_network.py", line 361, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "eval_ssd_network.py", line 212, in main matching_threshold=FLAGS.matching_threshold) File "/home/jn/SSD-Tensorflow-master/tf_extended/bboxes.py", line 363, in bboxes_matching_batch matching_threshold) File "/home/jn/SSD-Tensorflow-master/tf_extended/bboxes.py", line 379, in bboxes_matching_batch infer_shape=True) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/functional_ops.py", line 460, in map_fn maximum_iterations=n) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3274, in while_loop return_same_structure) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2994, in BuildLoop pred, body, original_loop_vars, loop_vars, shape_invariants) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2929, in _BuildLoop body_result = body(packed_vars_for_body) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3243, in body = lambda i, lv: (i + 1, orig_body(lv)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/functional_ops.py", line 449, in compute packed_fn_values = fn(packed_values) File "/home/jn/SSD-Tensorflow-master/tf_extended/bboxes.py", line 373, in matching_threshold), File "/home/jn/SSD-Tensorflow-master/tf_extended/bboxes.py", line 322, in bboxes_matching back_prop=False) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3274, in while_loop return_same_structure) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2994, in BuildLoop pred, body, original_loop_vars, loop_vars, shape_invariants) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2929, in _BuildLoop body_result = body(packed_vars_for_body) File "/home/jn/SSD-Tensorflow-master/tf_extended/bboxes.py", line 296, in m_body idxmax = tf.cast(tf.argmax(jaccard, axis=0), tf.int32) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(args, *kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 88, in argmax return gen_math_ops.arg_max(input, axis, name=name, output_type=output_type) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 787, in arg_max name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3272, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1768, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Reduction axis 0 is empty in shape [0] [[{{node bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/ArgMax}} = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/mul, bboxes_matching_batch_dict/bboxes_matching_batch_4/map/while/bboxes_matching_single/while/bboxes_jaccard/transpose_1/Range/start)]] [[{{node ssd_losses/cross_entropy_pos/value/_524}} = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3865_ssd_losses/cross_entropy_pos/value", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

JiangniHIT commented 5 years ago

aMP is so low?anyone can help me?thank you!!

ylqi007 commented 5 years ago

@petit-ami I saw that you were training and testing on PASCAL VOC 2012. I am training and evaluation on PASCAL VOC 2012 right now.

  1. I trained on the trainval of 2012 (17125 items) and tested on the test of 2012 (5138 items). But the mAP is about 0.038 either the training based on ssd_300_vgg.ckpt or not. Did you get a decent mAP? Could you give me some help, please?

  2. I also trained on 07+12(trainval of 2007 + trainval of 2012), and 07++12 (trainval & test of 2007 and trainval of 2012), the results are almost the same.

Sincerely

SunNYNO1 commented 5 years ago

I found it. In your log, it says,

INFO:tensorflow:Evaluating None

That means you cannot load trained file properly. Your evaluation is conducted with the initialized random weights network. The path of the checkpoint is correct?

@Sulince maybe your problem is exactly the same as this one.

thanks,I solve my problem in your way

ylqi007 commented 5 years ago

@SunNYNO1 Did you do the evaluation on Pascal VOC 2012?

I evaluated using ssd_300_vgg.ckpt and VGG_VOC0712_SSD_300x300_iter_120000.ckpt on dataset Pascal VOC 2012. But I got the results below: Screenshot from 2019-03-31 16-28-36 VGG_VOC0712_SSD_300x300_iter_120000 ckpt_2012 ssd_300_vgg ckpt_2012

Could you please give me some help, please?

Sincerely

jixingzheng commented 5 years ago

zhengjixing@amax1:~/SSD-Tensorflow-master$ python eval_ssd_network.py --eval_dir=./logs --dataset_dir=./test --datset_name=pascalvoc_2007 --dataset_split_name=test --model_name=ssd_300_vgg --checkpoint_path=./checkpoints/ssd_300_vgg.ckpt --batch_size=1 /home/fancy/program/anaconda2/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters WARNING:tensorflow:From eval_ssd_network.py:113: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step Traceback (most recent call last): File "eval_ssd_network.py", line 346, in tf.app.run() File "/home/fancy/program/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "eval_ssd_network.py", line 119, in main FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir) File "/home/zhengjixing/SSD-Tensorflow-master/datasets/dataset_factory.py", line 55, in get_dataset reader) File "/home/zhengjixing/SSD-Tensorflow-master/datasets/imagenet.py", line 136, in get_split raise ValueError('split name %s was not recognized.' % split_name) ValueError: split name test was not recognized.

How to solve this problem?@petit-ami

hust-huangjunhe commented 5 years ago

when I put two files of VGG_VOC0712_SSD_300x300_iter_120000.ckpt into the directory './checkpoints/VGG_VOC0712_SSD_300x300_iter_120000.ckpt', the result: AP_VOC07/mAP[0.67390402123709192] AP_VOC12/mAP[0.69139019683779168] when I do not put two files of VGG_VOC0712_SSD_300x300_iter_120000.ckpt into the directory './checkpoints/VGG_VOC0712_SSD_300x300_iter_120000.ckpt', the result: AP_VOC07/mAP[0.00010226225830017356] AP_VOC12/mAP[2.127145489434078e-05]

so you can check the method you unzip the VGG_VOC0712_SSD_300x300_iter_120000.ckpt. but I do not know the reason.

navy63 commented 5 years ago

I found it. In your log, it says,

INFO:tensorflow:Evaluating None

That means you cannot load trained file properly. Your evaluation is conducted with the initialized random weights network. The path of the checkpoint is correct?

@Sulince maybe your problem is exactly the same as this one.

Thank you!

Ajithbalakrishnan commented 4 years ago

@jixingzheng , Do the below steps.

  1. Add 'test' = 500 in imagenet.py - line number:49 eg: __SPLITS_TOSIZES = { 'train': 1281167, 'validation': 50000, 'test' = 500, } Pleasse note 500 is a random value. I hopes your issue will resolve.
xiaobumiDM commented 4 years ago

Thanks for reply! My environment :Ubuntu 18.04 + tensorflow-gpu 1.12.0 + python3.6 I changed my eval_ssd_network.py file and metrics.py file following the #321 ,and run eval_ssd_network.py successfully, but the result like this:

2019-03-07 09:54:28.724070: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:237] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node average_precision_voc07/ArithmeticOptimizer/HoistCommonFactor_Add_AddN is missing output properties at position :0 (num_outputs=0) AP_VOC07/mAP[3.1303382901083712e-05] AP_VOC12/mAP[1.4904059586232956e-05]

Help me please! Maybe you can give me your eval_ssd_network.py file and metrics.py file if you don't mind. My email: 15215420373@163.com Thanks a lot!! @kemangjaka

Can you tell me you env with cuda+tensorflow+python.