I may have found the reason why the model can't be trained.

balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow

4.11k stars 1.89k forks source link

I may have found the reason why the model can't be trained. #331

Open stpraha opened 5 years ago

stpraha commented 5 years ago

in /nets/ssd_common.py, there is a line mask = tf.logical_and(mask, feat_scores > -0.5) which is a incorrect method. Look the comments at the front of the code block.

Follow the original SSD paper for that purpose:

assign values when jaccard > 0.5;

only update if beat the score of other bboxes.

However, this code line will update in any case. I hope it helps.

dereklll commented 5 years ago

I‘ll give you guy a sixsixsix.

dereklll commented 5 years ago

oh,no. I want my sixsixsix back.

stpraha commented 5 years ago

@dereklll 意思是一开始有效，后面又失效了？

dereklll commented 5 years ago

@dereklll 意思是一开始有效，后面又失效了？

你这样正样本loss全部为0，被mask了，total loss只有负样本的loss

stpraha commented 5 years ago

@dereklll balancap的loss似乎不止localization loss 和 classification loss。他好像把classification loss分为了positive loss和negative loss，并在negative sample上做了hard negative mining。我这里loss function也改了，改成直接暴力对所有正负样本计算交叉熵。

alwaysPKU commented 5 years ago

求问楼主怎么改的呢

stpraha commented 5 years ago

@alwaysqi 是说损失函数？你可以搜一下别的Tensorflow的SSD看一下，就是把balancap版本的positive loss和negative loss合成一整个交叉熵loss（其实本就该这样）。缺点就是hard negative mining不能用balancap的了，得自己重写，不然训练起来就特别慢。

chenjjjjuu commented 5 years ago

@alwaysqi 是说损失函数？你可以搜一下别的Tensorflow的SSD看一下，就是把balancap版本的positive loss和negative loss合成一整个交叉熵loss（其实本就该这样）。缺点就是hard negative mining不能用balancap的了，得自己重写，不然训练起来就特别慢。

老哥求您修改后的的代码，马上开题了，我来不及了，要死了要死了要死了

stpraha commented 5 years ago

@chenjjjjuu 我也没能用的代码。。不好意思哈。。

navy63 commented 5 years ago

也是纠结了半天，他这个代码的loss基本不可训练，mAP基本也就是0，训练不起来。不知道啥毛病。