Closed ajskdlf64 closed 1 year ago
Thank you for your affirmation of our work and here are the explanations for related issues:
Dear Researchers
First of all, I would like to express my deepest gratitude for your quick response to my question.
I would like to ask some additional questions as I still have some unanswered questions.
in your answer you mentioned "texts(not words)", could you please elaborate on what you mean by "not words"?
I don't have a claude account, so I'm having a bit of trouble cloning the repository and reproducing it. Is there any way to reproduce it directly without using claude?
Below is an example of the prompts I was able to reproduce and generate. When I submitted the result of calling GPT using the prompts below, the score was 0.58, which is a big gap from the score written in the paper. Do you think there is anything wrong with the reproduction or what should I pay attention to when reproducing?
Thank you for reading this long question.
You are asked to answer questions asked on a document image.
The answers to questions are short text spans taken verbatim from the document. This means that the answers comprise a set of contiguous text tokens present in the document.
Document:
AUG 3 0 1977
Mr. Robert B. Choate
Chairman
Council on Children, Media
and Merchandising
1346 Connecticut Avenue, N.W.
Washington, D. C. 20036
Dear Bob:
Thank you for your July 11, 1977 letter which raises several
interesting issues. I will respond to them in the order listed.
Item 1
You correctly point out that, under recent court decisions, the require-
ments of the Federal Advisory Committee Act--including the requirement
that meetings be announced in advance and open to the public--do not
apply to groups such as the Food and Nutrition Board. 'I agree with you
that FDA should be extremely careful about entering into agreements by
which the work product of such groups could automatically become the
basis for a rule or other action by FDA. Our recent practice indicates
that we will review such arrangements with caution. For example, we
have discouraged USP's efforts to become our "sole source" supplier of
patient package inserts for drugs, in part because their processes are
not publicly accessible in the same way ours are. We have made it clear
that while we would welcome their ideas, we would first seek public
comment on any USP-generated draft on which we proposed to reply.
We have also encouraged the Cosmetic, Toiletry, and Fragrance Association
to ensure that the processes--including virtually all meetings--of the
ingredient review program be open to public attendance and participation.
These efforts have met with some success. We also have made it clear
that FDA will not automatically accept the judgments of the panel as the
basis for regulatory action on specific ingredients.
Furthermore, our regulations governing participation of FDA employees
in outside standard-setting activities impose conditions designed to
enhance public access to the workings of the groups on which we can be
represented. See 21 CFR 10.95 (enclosed).
: ..
. ...
Source: https://www.industrydocuments.tesf/.
Question: What is the name of the sender?
Directly extract the answer of the question from the document with as few words as possible.
Answer:
You are asked to answer questions asked on a document image.
The answers to questions are short text spans taken verbatim from the document. This means that the answers comprise a set of contiguous text tokens present in the document.
Document:
THE VISITING NURSE ASSOCIATION OF GREATER ST. LOUIS
STATEMENT OF OPERATIONS
FOR THE MONTH OF FEBRUARY 1977 COMPARED TO FEBRUARY 1976
Budget
Actual Budget Over % of
1976 1977 1977 (Under) Variance
Income
1 Home Visits $243, 022 $342, 830
2 Equipment Rental 18, 659
$329, 365$ 13, 465 4. 09
25, 403 20, 475 4, 928 24. 07
3 Miscellaneous 185 357 335 22
4 Medicare Allowance (25, 983) (61, 249)
6. 51
(35, 771)(25, 478) 71. 23
9 TOTAL OPERATING INCOME $235, 883 $307 , 341$314 , 404$ ( 7, 063)( 2. 25)
Expenses
10 Salaries $147 , 027 $184, 650$184, 207 443 0. 24
11 Payroll Taxes 8, 580 10, 774 12, 360 (12. 83)
12 Employee Benefits 6, 812
( 1, 586)
14, 352 14, 520
17, 280
168)
13 Transportation 14, 096 13 , 538 ( 3, 742)
( 1. 16)
6, 018 3, 415
(21. 66)
14 Supplies-Administrative 6, 940
6, 140 9 , 487 11, 920
( 3, 525)
( 2, 433)
(50. 79)
-Medical (20.41)
16 Patient's Rental Equip.16, 686 25, 302
10, 920
20, 240
10, 409
5, 062 25 . 01
17 Occupancy 8, 692 511 4. 91
18 Telephone 3, 153 2, 918 665)
19 Postage 1 , 698 1, 146
3, 583
1, 458
(18.56)
312) (21.40)
20 Auditing & Professional1, 000 2, 392 2 , 135 257 12.04
21 Data Processing Equipment
22 Conferences, Conventions
2, 933 3, 678 3, 968 290) ( 7.31)
and Meetings 1, 472 2, 859 3, 333 474) (14.22)
23 Depreciation-Furniture
and Fixtures 842 604 1, 550 946) (61.03)
24 Insurance 719 3, 661 4, 166 505 ) (12. 12)
25 Community Education -0- 68
26 Other
( 4, 098) (98. 37)
1 , 219
4, 166
1, 790 1, 249 541 43. 31
27 Organization Dues 392 400 500 100) (20.00)
39 TOTAL OPERATING EXPENSE$227, 479 $291, 954$303, 984 $ (12, 030) ( 3.96)
40 NET INCOME (LOSS)
FROM OPERATING $ 8,404 $ 15, 387$ 10, 420$ 4, 967 47. 67
50 United Way Income $ 27, 917 $ 18, 713$ 27, 917 $ ( 9, 204)
51 Indigent Care 28 , 601
(32.97)
27 , 917
$ (9, 204)
27, 917 -0-
52 Net Difference $ ( 684) $ -0- $ ( 9, 204)
53 NET INCOME OR (LOSS) $ 7, 720 $ 6, 183 $ 10, 420 $ ( 4, 237) (40.66)
Source: https://www.industrydocur
Question: What financial statement is it?
Directly extract the answer of the question from the document with as few words as possible.
Answer:
You are asked to answer questions asked on a document image.
The answers to questions are short text spans taken verbatim from the document. This means that the answers comprise a set of contiguous text tokens present in the document.
Document:
Food Table for "The Way to a Man's Heart"
Food Amount Cal Pro Fat CHO SF MF PF Chol Fe Alc
M, F, P 1 oz 55 8.1 2, 1 0.2 0.8 0.8 0.2 39 1. ]
LF Dairy 1 serv 210 9.4 10 .0 21.0 5.8 2. 6 0.6 33 0.4
Eggs 3/wk 35 2.8 2. 4 0. 2 0. 7 1.0 0.3 108 0.5
Fats+Oils 1 TB 94 0.3 10.3 0.5 1 .7 3.4 4.8 1
Breads, 1 serv 70 2. 7 0 . 7 13.7 0 .1 0 . 2 0:3 tr 1.0
Cereals
Fruits 1 serv 54 0.5 0.2 13.9 0. 2 0.6
Vegetables 1 serv 35 1.6 0.9 5.9 0.2 0. 1 0.6 0. 7
Desserts, 1 serv 124 1.0 2.2 18.0 0 . 4 0.8 1. 2 0.2 5.5
Bev, Swts
Source: https://www.industrydocuments.ucsf.edu/docs/qnmf0227
Question: How much is the iron (Fe) content in one serving of vegetables?
Directly extract the answer of the question from the document with as few words as possible.
Answer:
You are asked to answer questions asked on a document image.
The answers to questions are short text spans taken verbatim from the document. This means that the answers comprise a set of contiguous text tokens present in the document.
Document:
Dale D. Hoskins, Ph.D., Department of Biochemistry,
was advanced at the beginning of the year from Associate
Scientist to Scientist. Dr. Hoskins received his doctorate
from the University of Colorado School of Medicine in
1960 and joined the Center in 1961. He received his M.S.
in biochemistry from Oregon State University in 1955.
E. Rene Casillas, Ph.D., was promoted from postdoctorate
fellow in the Department of Biochemistry to Assistant Sci-
entist in January. Dr. Casillas came to the Center in 1968
from Oregon State University, where he received his Ph.D.,
and where he was assistant in the Department of Bio-
chemistry and Biophysics.
APPOINTMENTS Mary Bell, Ph.D., has been promoted from Assistant
Scientist to Associate Scientist in the Department of Cutane-
1971
ous Biology, effective in May. Dr. Bell joined the Center
staff in 1964 directly from Yale University, where she com-
pleted a U.S. Public Health Service Predoctoral Training
Fellowship in Anatomy.
Marjorie LaSalle, Ph.D., who has been an Assistant
Scientist at the Center since 1963, this month was named
Associate Scientist in Hematology. A graduate of Oregon
State University, she earned her M.S., in Microbiology at
the University of Oregon Medical School and her Ph. D.
from Stanford.
John A. Resko, Ph.D., was promoted to Scientist in the
Department of Reproductive Physiology and Behavior,
effective in May. Before coming to the Center as an Assist-
ant Scientist in 1964, he was a postdoctoral fellow at the
University of Utah. His major interest was steroid bio-
chemistry. He received his doctorate in physiology from the
University of Illinois in 1963.
32
Question: Who joined the Center in 1963?
Directly extract the answer of the question from the document with as few words as possible.
Answer:
You are asked to answer questions asked on a document image.
The answers to questions are short text spans taken verbatim from the document. This means that the answers comprise a set of contiguous text tokens present in the document.
Document:
Mr. Oley M. Cummer
Born June 9, 1889
Attended Fort Collins Grammar School from 1895 to 1905
Colorado A. & M. Prep from 1905 to 1907
Colorado A. & M. from 1907 to 1911 and graduated with a B.S. degree,
majoring in Mechanical Engineering
Permanent Employment date: July 1, 1911
1908-1911 - Machinist during school - Fort Collins
July 1911-1914 - Student Technician -
Sept. 1941-Oct. 1915 - General Foreman -
Oct. 1915 to June 1916 - Asst. Supt. - Windsor
June 1916 to Sept. 1916 -- Worked for Spreckels for G.W.S.Co. checking pulp drier
installation
Sept. 1916 to Feb. 17 - Operated Pulp Drier - Gering, Nebraska
Feb. 1917 to July 1918 - Consultant & operator of pulp engineering & installer
out of Denver
July 1918 to May 1920 - Traveling - Asst. to Schaffer, Gen. Supt.
May 1920 to May 1926 - Supt., Brush - (Pennant 1920)
May 1926 to May 1931 - Supt., Ovid
May 1931 to Aug. 1936 - Asst. Supt., Scottsbluff
Aug. 1936 to June 1940 - Supt. , Wheatland
June 1940 to June 1943 - Supt. , Lyman
Date of transfer to Scottsbluff as Superintendent not listed.
Mr. Cummer was due to retire July 1, 1954, but died June 1, 1954.
Source: https://www.industrydocuments.ucsf.edu/docs/zphk0226
Question: When was Mr. Oley M. Cummer born?
Directly extract the answer of the question from the document with as few words as possible.
Answer:
Here are the explanations for your questions:
Hope the above content is helpful to you and please feel free to discuss them further.
Thanks for the update! When I run the script below, I get the following error, what is the solution?
And one more question, is there a process to infer the prompt using gpt and post-process it afterwards?
bash script/claude_eval.sh 0 gpt-35 docvqa task_instruction_space
Traceback (most recent call last):
File "examples/claude_docvqa.py", line 22, in <module>
from utils import claude, space_layout, openai_api
File "./utils/claude.py", line 6, in <module>
c = anthropic.Client(os.environ["ANTHROPIC_API_KEY"])
File "/root/anaconda3/envs/cream/lib/python3.8/os.py", line 673, in __getitem__
raise KeyError(key) from None
KeyError: 'ANTHROPIC_API_KEY'
The reason for this error is that the environment variable ANTHROPIC_API_KEY
is not set. We have updated the code. Now, you only need to set the environment variables OPENAI_API_KEY
and OPENAI_API_BASE
to use GPT-3.5-turbo for inference normally.
Hope the above content is helpful to you and please feel free to discuss them further.
I am experiencing difficulties in reproducing your paper using the official Python OpenAI library. The reproduction process is not functioning correctly. I have a few questions and concerns regarding this matter:
We found that the experiment can not be reproduced with the official Python OpenAI library. Can you provide any insight into this discrepancy?
According to the OPENAI API Reference, the "completions" parameter is only applicable to older models such as "text-davinci-003." The model you utilized, "gpt-3.5-turbo," should use "chatCompletions" instead of "completions." Could you clarify why this difference exists?
Additionally, it appears that the correct model name is "gpt-3.5-turbo" (with a hyphen), not "gpt-35-turbo" as mentioned in the paper. Moreover, the argument to use in the chat completion should be "model" instead of "engine," as indicated in the latest version of the library. Can you confirm these changes?
I attempted to reproduce the provided example using "gpt-3.5-turbo" and the official OpenAI library. However, the performance I achieved on the test dataset was 0.58. Could you offer any insights into why my results differ from yours?
Have you attempted to reproduce the experiment using the official OpenAI library? I'm curious to know if the "LATIN-prompt" used in your paper might be overfitted to the MS Azure OpenAI model.
PS.1) Reproducing the execution by running the script in the main branch is challenging due to dependencies like "wandb." It would be greatly appreciated if you could address this aspect as well.
PS.2) My goal is to successfully reproduce your research and achieve a score of 0.81 for submission to RRC.
Your innovative and groundbreaking work has left a strong impression on me. I truly value your prompt responses and kind explanations.
Thanks.
openai_completion
function based on the official OpenAI implementation.Hope the above content is helpful to you and please feel free to discuss them further.
I understood this to mean to use the official OPENAI API, as a combination of "gpt-3.5-turbo" and "completions".
However, that combination is no longer supported by the official.
I've implemented openai_completion and openai_chat_completion like below by referring to LATIN's code.
When using "gpt-3.5-turbo", "openai_completions" throws an error, and when using "openai_chat_completions", a performance score of 0.58 was recorded.
import openai
openai.api_key = "---"
def openai_completion(prompt):
while True:
try:
response = openai.Completion.create(
model="gpt-3.5-turbo",
prompt=prompt,
max_tokens=200,
temperature=0,
stop="\n\n",
)
text = response['choices'][0]['text']
break
except TypeError:
print(f"TypeError, maybe the prompt is too long: {len(prompt)}. Redeucing the prompt length.")
if len(prompt) > 4000:
question_idx = prompt.rfind("\n\nQuestion:")
prompt = prompt[:4000-len(prompt[question_idx:])] + prompt[question_idx:]
else:
question_idx = prompt.rfind("\n\nQuestion:")
prompt = prompt[:question_idx-300] + prompt[question_idx:]
continue
return text
def openai_chat_completion(prompt):
messages = [{
"role": "user",
"content": prompt,
}]
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages,
max_tokens=200,
temperature=0,
stop="\n\n",
)
text = response['choices'][0]['message']['content']
return text
openai_completion(prompt="Write a tagline for an ice cream shop. ")
InvalidRequestError: This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?
openai_chat_completion(prompt="Write a tagline for an ice cream shop. ")
'"Scoops of happiness in every cone!"'
We are trying to adapt LATIN-Prompt to the latest chatCompletions
interface and analyze the differences between it and the completions
interface. An alternative approach to verifying the effect of LATIN-Prompt is to use older models like text-davinci-003
.
Thank you for your support of our project. We will give you feedback as soon as we complete the adaptation.
@ajskdlf64 Hello, we recently tried the chatCompletions
with the LATIN-Prompt and the result was around 0.58. We also conducted some preliminary experiments and found that the system message
has a significant impact on the experimental results. We believe that the difference in performance between chatCompletions
and completions
is mainly due to the system message
, while our method does not take into account the impact of it. We are analyzing the differences in prompts between system message
and user content
. You can also try some attempts from this perspective.
I commend the authors for their dedicated research and passion. I have indeed noticed that the performance can vary depending on the version and usage of OpenAI's API.
I encourage further research to advance into higher-quality papers.
Thank you!
Hey guys!
Very interesting topic and high quality paper from the research team.
After reading the paper and reproducing it with reference to the repository, I had a few questions that led me to raise an issue.
Is it correct that you used the boundingboxes of lines and the boundingboxes of texts in the original OCR provided by RRC leaderboard, and if so, was the performance not good when you used the boundingboxes of texts?
I was wondering if the ANLS values in your paper were calculated with your own code or the results you submitted to the RRC Leaderboard?
If the answer to 1 is LINE, is the overall process correct to use LINE OCR TEXT and TEXT_BOXES to get LAYOUT_RECOVER using SPACE_LAYOUT function and PROMPT_TASK to request GPT-3.5 API?
Thanks.
(P.S. There seems to be a typo in the title of the README.md :) Promot )