trying to open fda_510ks_gold_extractions.json file to get gold attributes which doesn't exists in the data source

amani-acog commented 1 year ago

when i run the script run.sh, it is trying to read some attributes from fda_510ks_gold_extractions.json which doesn't exists in the data source. And when I see the source code it more appeared to work for fda data. If I want to provide my data how to use and what are the prerequisites

amani-acog commented 1 year ago

And can you please provide some details about what is "evaporate-code", "weak-supervision code", "manifest". what are they used for...

simran-arora commented 1 year ago

Hi! You can find the gold extractions at this link: https://huggingface.co/datasets/hazyresearch/evaporate/blob/main/data/fda_510ks/table.json

Evaporate-code is the system that identifies the attributes and generates functions to extract them (It is the main logic of the system described in the paper)

Weak supervision code takes the extractions produced by N functions across each of the D documents and aggregates them to produce one attribute extraction per document. With weak supervision, we are aggregating the function extractions by estimating the quality of each function. The code base has an implementation to produce this estimation of function quality and returns the final extraction per document.

Manifest is a tool that caches LLM generations. Since inference is slow with local models / costs money with API models we would like to prevent rerunning inference if we rerun things with the same exact prompt. So, manifest helps support caching! Note that different models/APIs also expose different interfaces (e.g. Anthropic vs. OpenAI use different syntax), and manifest helps us use a common interface.

Please let me know if you have additional questions.

HazyResearch / evaporate

trying to open fda_510ks_gold_extractions.json file to get gold attributes which doesn't exists in the data source #11