Closed hitsense closed 1 year ago
Hi! This is likely the LLM unable to handle the complexity of the task.
Things to try:
1) Add examples
2) Improve the descriptions in the schema
3) Specify an input formatter of triple_quotes
if working with multi-paragraph inputs
4) Try a better model if you're not doing so already (gpt-4, text-davinci-003)
5) Break the schema into a few smaller schemas, run extractions and combine results
7) If possible to flatten the object, you can use CSV encoding which will improve results
8) Add verification / correction steps (ask an LLM to correct or verify the results of the extraction)
Feel free to share your schema if you want me to take a look at it.
If you're extracting information from a single structured source (e.g., linkedin), an LLM is not a good approach -- a traditional web-scraping would be a lot cheaper and much more reliable.
If perfect quality is needed, then even with all the hacks above, you'll need to plan on having a human in the loop as even the best LLMs will make mistakes with complex extraction tasks.
Update -
After changing
str
to List[str]
csv
to json
It was able to correctly assign key to values
Thanks, @eyurtsev, for your reply.
I am already doing items 1-6 from your list.
However, after making those changes in my comment above, it works.
Thanks again!
Hi @eyurtsev, I think there is something wrong with the structure of document extraction output. The values for each key is mixed up with some other keys, it seems. Check the attached screenshot.
Here, the value for
employment period
is mixed withcompany location
. Similarly, value foremployment period
should actually go toskills at job
I think the value assignment should go one step down.