Open matias-biatoz opened 1 year ago
I haven't used the new API for extraction yet, so I don't have a sense of quality. Previously gpt-3.5-turbo seemed significantly worse than text-davinci-003.
If anyone is willing to run some experiments, the thing to do is:
Set up a zero and few shot scenario with the new API (with AIMessages that contain the function invocation request payload specified), and see how it performs against text-davinci-003.
Compare with json
encoding, and also with a csv
encoding. It would be good to confirm that the new chat API does a good job at extracting multiple entities for a long passage of text.
Folks should try this:
Please if you try this in your data and see any differences in performance let me know!
I have been trying out 3 different ways of data extraction.
I also found that if you need to do "semantic extraction" (basically trying to realize what the user actually means) Kor has better results for me at the moment. I think maybe some way of combining Kor with functions to force a structured results (and prevent hallucinations) would get optimal results
2023-06-13 OpenAI's announcement of the API's changes allows to pass a function parameter which supposedly improves the LLM interpretation of the task. (Langchain already implemented the necessary changes in 0.199/0.200) Do you see how can this be use to improve Kor data extraction?