HazyResearch / evaporate

This repo contains data and code for the paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes"
480 stars 45 forks source link

running with OpenAI GPT #18

Open raywgs opened 1 year ago

raywgs commented 1 year ago

Hey,

Do I just replace the KEYS with my OpenAI API key for the run.sh to run using gpt-4? Or where do I configure that?

Thanks

simran-arora commented 1 year ago

Yes, you should include your key in the bash script

The keys get pulled in here, when the Manifest objects are being constructed: https://github.com/HazyResearch/evaporate/blob/83204a54dd97fb0f51a01643b4fc16c97fc5e472/evaporate/utils.py#L138

raywgs commented 1 year ago

Hey simran, how do I run evaporate on my own dataset where I don't have a ground truth? I just want to test the performance on my own data that is a list of txt files.

simran-arora commented 1 year ago

Hi! the gold extractions are used to provide feedback in a couple places in the pipeline.

  1. When the WS label model is being trained. You can just put in a dummy file or modify the code to not perform the scoring. https://github.com/HazyResearch/evaporate/blob/83204a54dd97fb0f51a01643b4fc16c97fc5e472/evaporate/profiler.py#L160
  2. In the run_profiler.py code you can comment out anything that's an evaluation method. For instance: https://github.com/HazyResearch/evaporate/blob/83204a54dd97fb0f51a01643b4fc16c97fc5e472/evaporate/run_profiler.py#L284

Hope that helps!

raywgs commented 1 year ago

So, I can use an empty json file in place for the gold extractions and still have evaporate run normally?