Open whqwill opened 3 years ago
I also did the same thing on the coco dataset, some results have same problem:
000000000090.jpg
a field with a tree and a cow grazing in
000000000128.jpg
an elephant standing next to a box on a
000000000180.jpg
a black bear walking in the grass next to a
Do you solve this problem?
I have the same problem
I have the same problem. What is the number of instance of ROI features you used as the input? I used 36.
updated: I change the number ROI to 100 and I got a complete caption. But it is less accurate. I don't know what the recommended number of ROI.
This is the caption when I used 36 ROI:
'a', 'man', 'is', 'riding', 'a', 'horse', 'in', 'a', ' ', ' ', ' ', ' ', ' '
This is the caption when I used 100 ROI:
'two', 'people', 'riding', 'horses', 'in', 'a', 'group', 'of', 'people'
Do you use their pretrained model or you re-trained it on your own? Because I faced the same problem but I think their model works correctly only if you use exactly their same model and setting for feature extraction (there are several in the bottom-up-attention repo), since it is trained using them. If you just increment the number of ROIS you also add noise to the image representation and I think it's normal that the caption it's less accurate.
In the following days I'm planning to train the model with my own feature extraction models and see what happens, I'll keep you updated
Can you please tell me how to get the caption through this image captioning model? Because I couldn't find any caption output in my folder....
Hello,
I trained a model with the default parameters and also noticed the same issue. The pretrained model that is available from the link on the description of the repo seems to also produce incomplete captions. I did some digging and I believe that the issue stems from the implementation of optimization using the self critical loss. According to the authors of the self critical loss paper (Appendix section E):
One detail that was crucial to optimizing CIDEr to produce better models was to include the EOS tag as a word. When the EOS word was omitted, trivial sentence fragments such as “with a” and “and a” were dominating the metric gains, despite the `gaming' counter-measures (sentence length and precision clipping) that are included in CIDEr-D [13], which is what we optimized. Including the EOS tag substantially lowers the reward allocated to incomplete sentences, and completely resolved this issue. Another more obvious detail that is important is to associate the reward for the sentence with the first EOS encountered. Omitting the reward from the first EOS fails to reward sentence completion which leads to run-on, and rewarding any words that follow the first EOS token is inconsistent with the decoding procedure.
In my case, I noticed that all incomplete captions were missing a reference to a noun (possibly with an adjective), just like the above examples. From my understanding, it appears that the model is reluctant to produce that noun and the learnt policy indicates that it is better to generate an incomplete caption and receive the adjusted reward rather than make a 'risky' prediction. The solution, just like the authors said, was to simply include the EOS token in both candidate and reference captions
I simply defined the add_eos
boolean variable to distinguish between decoding during RL optimization and decoding during evaluation and modified the loop inside tokenizer.py
# create dictionary for tokenized captions
for k, line in zip(image_id, lines):
if not k in tokenized_corpus:
tokenized_corpus[k] = []
tokenized_caption = ' '.join([w for w in line.rstrip().split(' ') \
if w not in cls.punctuations])
if add_eos:
tokenized_caption += " {}".format(cls.eos_token)
tokenized_corpus[k].append(tokenized_caption)
I use https://github.com/peteanderson80/bottom-up-attention/ for feature extraction on my own images, and then run the image caption model, but the result caption is incomplete.
e.g.
caption: "a view of a city with a building in the"
caption: "a view of a city with a view of a river and a"
caption: "a woman in a yellow dress walking on a"
It seems the result is truncated.
Hello, how do you get the image with caption? I mean, running 'test.py', only get a series of scores as output. I find captions in variable 'gts' while not matched with images.
我也有同样的问题。您用作输入的 ROI 特征实例的数量是多少?我用的是36。
您好,我想请教您一下,这个对一个图片用模型输出它的描述的代码是哪个py文件,要怎么使用呢?
I use https://github.com/peteanderson80/bottom-up-attention/ for feature extraction on my own images, and then run the image caption model, but the result caption is incomplete.
e.g.
caption: "a view of a city with a building in the"
caption: "a view of a city with a view of a river and a"
caption: "a woman in a yellow dress walking on a"
It seems the result is truncated.
Hello, I would like to ask you which python file is the code to output the description of a picture with a trained model, and how to use it
Hello,
I trained a model with the default parameters and also noticed the same issue. The pretrained model that is available from the link on the description of the repo seems to also produce incomplete captions. I did some digging and I believe that the issue stems from the implementation of optimization using the self critical loss. According to the authors of the self critical loss paper (Appendix section E):
One detail that was crucial to optimizing CIDEr to produce better models was to include the EOS tag as a word. When the EOS word was omitted, trivial sentence fragments such as “with a” and “and a” were dominating the metric gains, despite the `gaming' counter-measures (sentence length and precision clipping) that are included in CIDEr-D [13], which is what we optimized. Including the EOS tag substantially lowers the reward allocated to incomplete sentences, and completely resolved this issue. Another more obvious detail that is important is to associate the reward for the sentence with the first EOS encountered. Omitting the reward from the first EOS fails to reward sentence completion which leads to run-on, and rewarding any words that follow the first EOS token is inconsistent with the decoding procedure.
In my case, I noticed that all incomplete captions were missing a reference to a noun (possibly with an adjective), just like the above examples. From my understanding, it appears that the model is reluctant to produce that noun and the learnt policy indicates that it is better to generate an incomplete caption and receive the adjusted reward rather than make a 'risky' prediction. The solution, just like the authors said, was to simply include the EOS token in both candidate and reference captions
I simply defined the
add_eos
boolean variable to distinguish between decoding during RL optimization and decoding during evaluation and modified the loop insidetokenizer.py
# create dictionary for tokenized captions for k, line in zip(image_id, lines): if not k in tokenized_corpus: tokenized_corpus[k] = [] tokenized_caption = ' '.join([w for w in line.rstrip().split(' ') \ if w not in cls.punctuations]) if add_eos: tokenized_caption += " {}".format(cls.eos_token) tokenized_corpus[k].append(tokenized_caption)
I got "AttributeError: type object 'PTBTokenizer' has no attribute 'eos_token'". How can i get 'eos_token'? what else needs to be changed?
Hello, I trained a model with the default parameters and also noticed the same issue. The pretrained model that is available from the link on the description of the repo seems to also produce incomplete captions. I did some digging and I believe that the issue stems from the implementation of optimization using the self critical loss. According to the authors of the self critical loss paper (Appendix section E):
One detail that was crucial to optimizing CIDEr to produce better models was to include the EOS tag as a word. When the EOS word was omitted, trivial sentence fragments such as “with a” and “and a” were dominating the metric gains, despite the `gaming' counter-measures (sentence length and precision clipping) that are included in CIDEr-D [13], which is what we optimized. Including the EOS tag substantially lowers the reward allocated to incomplete sentences, and completely resolved this issue. Another more obvious detail that is important is to associate the reward for the sentence with the first EOS encountered. Omitting the reward from the first EOS fails to reward sentence completion which leads to run-on, and rewarding any words that follow the first EOS token is inconsistent with the decoding procedure.
In my case, I noticed that all incomplete captions were missing a reference to a noun (possibly with an adjective), just like the above examples. From my understanding, it appears that the model is reluctant to produce that noun and the learnt policy indicates that it is better to generate an incomplete caption and receive the adjusted reward rather than make a 'risky' prediction. The solution, just like the authors said, was to simply include the EOS token in both candidate and reference captions I simply defined the
add_eos
boolean variable to distinguish between decoding during RL optimization and decoding during evaluation and modified the loop insidetokenizer.py
# create dictionary for tokenized captions for k, line in zip(image_id, lines): if not k in tokenized_corpus: tokenized_corpus[k] = [] tokenized_caption = ' '.join([w for w in line.rstrip().split(' ') \ if w not in cls.punctuations]) if add_eos: tokenized_caption += " {}".format(cls.eos_token) tokenized_corpus[k].append(tokenized_caption)
I got "AttributeError: type object 'PTBTokenizer' has no attribute 'eos_token'". How can i get 'eos_token'? what else needs to be changed?
Hey, you will need to define the eos_token at the definition of the PTBTokenizer. I think that the default eos token used in train.py is '
So simply add eos_token = "<eos>"
and you should be good to go
Hello, I trained a model with the default parameters and also noticed the same issue. The pretrained model that is available from the link on the description of the repo seems to also produce incomplete captions. I did some digging and I believe that the issue stems from the implementation of optimization using the self critical loss. According to the authors of the self critical loss paper (Appendix section E):
One detail that was crucial to optimizing CIDEr to produce better models was to include the EOS tag as a word. When the EOS word was omitted, trivial sentence fragments such as “with a” and “and a” were dominating the metric gains, despite the `gaming' counter-measures (sentence length and precision clipping) that are included in CIDEr-D [13], which is what we optimized. Including the EOS tag substantially lowers the reward allocated to incomplete sentences, and completely resolved this issue. Another more obvious detail that is important is to associate the reward for the sentence with the first EOS encountered. Omitting the reward from the first EOS fails to reward sentence completion which leads to run-on, and rewarding any words that follow the first EOS token is inconsistent with the decoding procedure.
In my case, I noticed that all incomplete captions were missing a reference to a noun (possibly with an adjective), just like the above examples. From my understanding, it appears that the model is reluctant to produce that noun and the learnt policy indicates that it is better to generate an incomplete caption and receive the adjusted reward rather than make a 'risky' prediction. The solution, just like the authors said, was to simply include the EOS token in both candidate and reference captions I simply defined the
add_eos
boolean variable to distinguish between decoding during RL optimization and decoding during evaluation and modified the loop insidetokenizer.py
# create dictionary for tokenized captions for k, line in zip(image_id, lines): if not k in tokenized_corpus: tokenized_corpus[k] = [] tokenized_caption = ' '.join([w for w in line.rstrip().split(' ') \ if w not in cls.punctuations]) if add_eos: tokenized_caption += " {}".format(cls.eos_token) tokenized_corpus[k].append(tokenized_caption)
I got "AttributeError: type object 'PTBTokenizer' has no attribute 'eos_token'". How can i get 'eos_token'? what else needs to be changed?
Hey, you will need to define the eos_token at the definition of the PTBTokenizer. I think that the default eos token used in train.py is ''
So simply add
eos_token = "<eos>"
and you should be good to go
thanks! It's helpful!!!
@gpantaz Hello, I see that this loop is used in the DLCT model, but the description is still incomplete, are your results complete? thanks!
@gpantaz Hello, I see that this loop is used in the DLCT model, but the description is still incomplete, are your results complete? thanks!
Hello, sadly I am not aware of the DLCT model. I noticed that I had the same issue with incomplete captions after SCST training, but the above fix worked for me. Is the DLCT model using the same evaluation code? Maybe they use a different end of sequence token?
@gpantaz Hello, I see that this loop is used in the DLCT model, but the description is still incomplete, are your results complete? thanks!
Hello, sadly I am not aware of the DLCT model. I noticed that I had the same issue with incomplete captions after SCST training, but the above fix worked for me. Is the DLCT model using the same evaluation code? Maybe they use a different end of sequence token?
Sorry, I missed some code,’add_eos’ is not used in the DLCT model, it lacks the if statement,thanks!
if add_eos: tokenized_caption += " {}".format(cls.eos_token)
@gpantaz Hello, I see that this loop is used in the DLCT model, but the description is still incomplete, are your results complete? thanks!
Hello, sadly I am not aware of the DLCT model. I noticed that I had the same issue with incomplete captions after SCST training, but the above fix worked for me. Is the DLCT model using the same evaluation code? Maybe they use a different end of sequence token?
Hello,May I ask how the specific definition of the variable 'add_eos' is realized? In addition, is the variable eos_token in the above picture the same? Thanks!!!
@gpantaz I have a doubt, how do you define 'add_eos'? and I got the error "AttributeError: type object 'PTBTokenizer' has no attribute 'eos_token'", i don't understand how can I solve this from the answers above. THANKS!
@gpantaz Excuse me, I want to reproduce the visualization results, but I cannot find the corresponding code in this repo. Can you please tell me how to achieve it?
你能告诉我如何通过这个图像字幕模型获得标题吗?因为我在我的文件夹中找不到任何字幕输出.... Have you solved it yet
Hello,
I trained a model with the default parameters and also noticed the same issue. The pretrained model that is available from the link on the description of the repo seems to also produce incomplete captions. I did some digging and I believe that the issue stems from the implementation of optimization using the self critical loss. According to the authors of the self critical loss paper (Appendix section E):
One detail that was crucial to optimizing CIDEr to produce better models was to include the EOS tag as a word. When the EOS word was omitted, trivial sentence fragments such as “with a” and “and a” were dominating the metric gains, despite the `gaming' counter-measures (sentence length and precision clipping) that are included in CIDEr-D [13], which is what we optimized. Including the EOS tag substantially lowers the reward allocated to incomplete sentences, and completely resolved this issue. Another more obvious detail that is important is to associate the reward for the sentence with the first EOS encountered. Omitting the reward from the first EOS fails to reward sentence completion which leads to run-on, and rewarding any words that follow the first EOS token is inconsistent with the decoding procedure.
In my case, I noticed that all incomplete captions were missing a reference to a noun (possibly with an adjective), just like the above examples. From my understanding, it appears that the model is reluctant to produce that noun and the learnt policy indicates that it is better to generate an incomplete caption and receive the adjusted reward rather than make a 'risky' prediction. The solution, just like the authors said, was to simply include the EOS token in both candidate and reference captions
I simply defined the
add_eos
boolean variable to distinguish between decoding during RL optimization and decoding during evaluation and modified the loop insidetokenizer.py
# create dictionary for tokenized captions for k, line in zip(image_id, lines): if not k in tokenized_corpus: tokenized_corpus[k] = [] tokenized_caption = ' '.join([w for w in line.rstrip().split(' ') \ if w not in cls.punctuations]) if add_eos: tokenized_caption += " {}".format(cls.eos_token) tokenized_corpus[k].append(tokenized_caption)
Great insight! I'm curious about when to set add_eos=True or False. Thanks for answering!
I use https://github.com/peteanderson80/bottom-up-attention/ for feature extraction on my own images, and then run the image caption model, but the result caption is incomplete.
e.g.
caption: "a view of a city with a building in the"
caption: "a view of a city with a view of a river and a"
caption: "a woman in a yellow dress walking on a"
It seems the result is truncated.