Open JBGruber opened 1 month ago
I think we essentially just need a RAG system. I've started a Quarto notebook here: https://github.com/JBGruber/opinion-wg2/blob/gllm-annotation/paper-annotation-gllm/llm-annotation.qmd
The notebook contains code that downloads the validation data already. I think it's unlikely we can validate this automatically. Probably it makes more sense to check manually whether each answer is correct or not and note it down. (obviously we need to set a seed to reproduce the answers)
You can find the Codebook here: https://docs.google.com/document/d/185Q1IuJ0ebIFEb1BepMxkbzt23XrYX_QOJCUGqrKdbw/edit#heading=h.hkgzwqhr5jie
Or in this Notebook that set up the task (it also contains the variable names in the annotated data): https://github.com/JBGruber/opinion-wg2/blob/gllm-annotation/paper-annotation/3._wg2_full_paper_annotation.qmd
You should work in the "gllm-annotation" Github branch for this. You don't have to use my approach or the notebook though (also I much prefer it over Jupyter and you can use it with Python). And if you have a better idea, let me know.
I would prefer that this is done via an open model, like llama3. But if this does not work well enough, I think we should consider OpenAI. Here are some thoughts:
Bruno is working on a pipeline for processing papers with GPT4. Right now, the bottleneck is parsing URLS stored in footnotes which are far away from the reference source in text.
Once he solves this, we move forward to process ~ 10 papers and estimate the costs before proceeding to test all papers from the test sample.
Depending on the costs, we need to agree on how to pay and which API key to use.
We proceed to test all papers and evaluate the perfomance.
During this process, we will determine whether this sample is enough or we need to annotate more full papers #14
[ ] We need to agree on the slot for next Zoom meeting after Salamanca.
The question in Q31_mult is the same as Q32. I believe this is a typo
This has yielded excellent results in our coding in Limassol, where we did this with ChatGPT. Essentially, we simply ask a model to answer the questions we ask coders (outlined here). ChatGPT has a way to handle PDFs on the web interface (maybe also on the API). Ideally, we would try to do this with an open source model like llama3 (open-webui also supports PDF uploads, not sure how they handle it). But given the length of papers and the fact that we might not be too concerend with reproducibility in this case, GPT-4o might make most sense.