Open xzdandy opened 10 months ago
[ ["name", "name of the user profile", "logicx"], ["country", "country the user comes from", "United States"] ],
Hi @pchunduri6, very good feedback.
1) Backend will be translated to what we have similar in the stargazers. Optimization like batching will be applied accordingly. There are also new opportunities like merging, for example, some columns are extracted in predicate while some are extracted in projection.
GPT35Azure("You are given a block of disorganized text extracted from the GitHub user profile of a user using an automated web scraper. The goal is to get structured results from this data.
Extract the following fields from the text: name, country, city, email, occupation, programming_languages, topics_of_interest, social_media.
If some field is not found, just output fieldname: N/A. Always return all the 8 field names. DO NOT add any additional text to your output.
The topic_of_interest field must list a broad range of technical topics that are mentioned in any portion of the text. This field is the most important, so add as much information as you can. Do not add non-technical interests.
The programming_languages field can contain one or more programming languages out of only the following 4 programming languages - Python, C++, JavaScript, Java. Do not include any other language outside these 4 languages in the output. If the user is not interested in any of these 4 programming languages, output N/A.
If the country is not available, use the city field to fill the country. For example, if the city is New York, fill the country as United States.
If there are social media links, including personal websites, add them to the social media section. Do NOT add social media links that are not present.
Here is an example (use it only for the output format, not for the content):
name: logicx
country: United States
city: Atlanta
email: abc@gatech.edu
occupation: PhD student at Georgia Tech
programming_languages: Python, Java
topics_of_interest: Google Colab, fake data generation, Postgres
social_media: https://www.logicx.io, https://www.twitter.com/logicx, https://www.linkedin.com/in/logicx
", stargazerscrapeddetails.extracted_text
)
2) I have exactly similar thoughts. We can provide a full prompt to the engineer. But non advanced user may not know how to write a proper prompt for this purpose. The proposed interface is more user friendly and simple. I agree it can lose some accuracy but power users can always write the above fully customized query in EvaDB. For this asepct, I am eager to see more feedback on the design.
3) Feedback on RAG is helpful. Is RAG useful for extracting column information? or when it will be useful, since the current stargazer does not use that. And it is also easier to implement the EXTRACT_COLUMNS
without RAG. We need to evaluate the efforts and gains.
Hey @gaurav274 introduced me to this issue.
Seems interesting. Can I take it up?
HI @hershd23 , thanks for your interest! Yes!
https://github.com/hershd23/eva-structure-gpt
Have something up just as a quick and dirty POC. Mostly testing for the testing of the prompt which I build incrementally. I think this is good enough to start work on the function itself
Search before asking
Description
EXTRACT_COLUMNS
will be similar toEXTRACT_OBJECT
for videos, which is not a standard user defined functions. In optimizer, it will be translated to a valid EvaDB query plan tree with multiple functions and operators.Example Usage
""
, then RAG will not be used. In the first release ofEXTRACT_COLUMNS
, we will not support RAG.If we want to provide more fined grained controls, for example tuning hyper paramters, we can also introduce a
CREATE FUNCTION
, which allows us to have a key-value based configuration.@gaurav274 @jiashenC Please provide feedback. Thanks.
Use case
No response
Are you willing to submit a PR?