JDAI-CV / image-captioning

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
268 stars 52 forks source link

About Ensemble/Fusion model in paper #5

Closed Jordan-5i closed 4 years ago

Jordan-5i commented 4 years ago

我最近阅读了你的论文,我对Ensemble/Fusion model 有疑惑。这个Ensemble/Fusion model是用的哪些模型,是不是X-Linear和Xtransformer一起去生成caption的吗?可以具体说一下吗?

Panda-Peter commented 4 years ago

X-Linear和Xtransformer是各自ensemble的。对于每个方法,只是改了random seed,然后fusion。

Jordan-5i commented 4 years ago

也就是X-Linear的fusion或者Xtransformer的fusion,是不是他们的模型结构都是完全相同的,只是模型之间的参数值不一样而已?

Panda-Peter commented 4 years ago

训练参数都是一样的,只是random seed不一样。

Jordan-5i commented 4 years ago

顺便问一下,你是几个模型做fusion

Panda-Peter commented 4 years ago

四个。

ShiZiqiang commented 4 years ago

Amazing work! I wander how to do the ensemble? Before the final softmax or after softmax? Is the ensemble just the average of output embedding matrix? Sorry for this stupid question.

Panda-Peter commented 4 years ago

The ensemble is performed by fusing the predicted scores after softmax.

ShiZiqiang commented 4 years ago

Thank you so much.

WangLanxiao commented 2 years ago

The ensemble is performed by fusing the predicted scores after softmax.

Thanks for you contribution! Does this ensemble method like the beamsearch? Just beamsearch happens in the same model, the ensemble method makes beamsearch in more models?