arXiv / html_feedback

Supports a student project developing a UI for feedback on arXiv articles rendered as html.
MIT License
14 stars 2 forks source link

Overlapping Texts #1500

Open BibekKoirala opened 1 week ago

BibekKoirala commented 1 week ago


The texts are overlapping in this selection. The Table and the texts should not overlap.

(Optional:) Please add any files, screenshots, or other information here.

No response

(Required) What is this issue most closely related to? Select one.

Choose One

Internal issue ID


Paper URL



Device Type


html-feedback-bot[bot] commented 1 week ago

Location in document: A1.p5.pic1.

Selected HTML:

Please evaluate the given text caption in relation to its corresponding image description. Your goal is to determine if the text caption provides additional semantic information that isn’t readily apparent just from the image itself. For example: 1. If the image description mentions "a man" but the caption elaborates he is a "homeless man" or a "businessman," then the caption is enriching the semantic context. 2. If the caption introduces concepts like the mathematical tangent function, which require in-depth knowledge to deduce, it is imparting external semantics. 3. Captions revealing specific location addresses, festival details, or other nuanced data not easy to infer from the image also provide external semantic information. 4. Directly identifying specific entities in the image such as buildings, people, bird species, animal breeds, car models, engines, etc., in the caption introduces additional insights. 5. Should the image act as a contextual backdrop and the caption describes elements not explicitly showcased in the image, it has semantic depth. 6. Lastly, if the caption depicts relationships between the subjects in the image, which need commonsense knowledge to understand, it should be considered semantically rich. Please assess and determine the extent of semantic enrichment the caption provides over the image description. Rate the text caption’s semantic depth on a scale from 1 to 100.

Appendix B Examples for Two Prompting Strategies

Example for Chain-of-Thought Reasoning

Example for Rationalization

Please evaluate if the provided text caption accurately represents the main features and objects of the image. The caption doesn’t need to detail every aspect of the image, but it should capture its primary theme. Rate the overall quality of the text caption’s match to the image on a scale of 1-100, considering the criteria mentioned.

Please evaluate if the provided text caption accurately represents the main features and objects of the image. The caption doesn’t need to detail every aspect of the image, but it should capture its primary theme. Rate the overall quality of the text caption’s match to the image on a scale of 1-100, considering the criteria mentioned.

Please think step by step to first output your reasons to give such a score. In the subsequent line, please output a single line containing the value indicating the scores.

Please first output a single line containing the value indicating the scores. In the subsequent line, please provide a comprehensive explanation of your evaluation, avoiding any potential bias.

Table 7: Prompts for zero-shot Chain-of-Thought reasoning and Rationalization reasoning for assessing the image-text matching score.
github-actions[bot] commented 1 week ago

Hello @BibekKoirala, thanks for the issue report! We are reviewing your report and will address it as soon as possible.