Closed tramdas closed 11 months ago
@sonalikapatel84 or whoever else picks this up, please sync up with Sam (@ssoliver on Discord, he's not in the Github yet) about ideas on how to approach the task.
Approach: Given the brand and model of the TV, we would apply prompt engineering for lexical semantics of the inputs.
@samoliverschumacher Adding you here for ideas in this task.
Once above work is done, Langchain library can be leveraged for the json output. https://python.langchain.com/docs/modules/model_io/output_parsers/pydantic
@sonalikapatel84 if you start to run out of time, feel free to leave the json output for a followup ticket. Maybe just spend 10-15 mins max on that aspect of it, I'd say it's more valuable to play around with the prompt engineering and do some testing to get consistent and reliable performance.
One approach for prompt engineering that you can use is n-shot prompting. Get a few sample images, extract the text, identify the brand/model yourself (e.g. "8" "MSUNG" = "SAMSUNG") and provide those examples within your prompt.
Few shot prompting on tv box images.
In the first prompt in the doc, I have asked GPT to act as a manufacturer and say "not a tv" in case the information is not of a TV. Without this prompt, GPT would mandatorily return some string for the brand and model strings.
I will try more images tomorrow and play around more of this technique.
@sonalikapatel84 cool stuff, I suggest let's reduce the scope of this task to just some prompt engineering research, we can make a followup task to develop the python tool around it. Can you please produce your writeup as a md file and save the sample images as well? I think it should be fine to save this within the CSRAssist repo as a work in progress
Thank you. Will do soon.
I think we can close this issue, as https://github.com/A-I-P-A/CSRAssist/pull/5 has been merged with all the research notes, and https://github.com/A-I-P-A/CSRAssist/issues/4 has been filed for the followup coding work. @sonalikapatel84 did I miss anything?
All good @tramdas . I will mark the ticket for closure.
This task is related to the "Extract text from images" task so it would be good to work closely with whoever is working on that, but I don't think it is strictly blocked on that ticket because you can start by manually extracting text from sample images, such as the images of the TV from Serge's "sell my TV" task.
Once all the text has been extracted from the image, it will be necessary to filter through it to pick out the relevant information, such as the model string. Just focus on obtaining the model string, and returning it within a JSON payload.
@sonalikapatel84 as discussed on the Tuesday morning call, you seemed interested in this one, so I've tentatively assigned it to you.
Whoever works on this, I suggest you start off just using openai. You may want to use the langchain library because that should theoretically make it relatively painless to switch to an alternative LLM such as a model that Sam might recommend. If you get to the point where you want to try things with openai please reach out to Tirath for an OpenAI API key.