YoadTew / zero-shot-image-to-text

Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
262 stars 42 forks source link

Should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator to reproduce image caption results? #12

Open baiyuting opened 1 year ago

baiyuting commented 1 year ago

in readme.md, to perform image caption, the command is $ python run.py --reset_context_delta --caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg" however, in the paper, it said that the end_factor is 1.04 and time-step is 3. To reproduce image caption results, should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator?

shams2023 commented 1 year ago

in readme.md, to perform image caption, the command is however, in the paper, it said that the end_factor is 1.04 and time-step is 3. To reproduce image caption results, should I additionally set end_factor to 1.04 in the command and set the variable self.ef_idx to 3 in class CLIPTextGenerator?$ python run.py --reset_context_delta --caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"

请问你成功的运行了吗?