BoxcarsAI / boxcars

Building applications with composability using Boxcars with LLM's. Inspired by LangChain.
MIT License
430 stars 39 forks source link

Improve Google Search by parsing answer box with the engine? #70

Closed itsderek23 closed 9 months ago

itsderek23 commented 1 year ago

I've noticed that quite a few queries executed via the serpapi return a valid answer box, but the GoogleSearch boxcar returns a snippet from an organic result instead. If searching for realtime information (ex: weather, current time, financial info) this snippet result is usually incorrect.

For example, if I run the following:

engine = Boxcars::Openai.new
boxcar = Boxcars::GoogleSearch.new
boxcar.run("what is the current time in America/Denver")

The JSON response contains:

res.dig *[:answer_box]
{:type=>"local_time", :result=>"8:02 AM", :extensions=>["Sunday, May 7, 2023 (MDT)", "Time in Denver, CO"]}

...which is correct. However, none of the ANSWER_LOCATIONS with an answer_box will match this:

%i[answer_box answer],
 %i[answer_box snippet],
  [:answer_box, :snippet_highlighted_words, 0],

...so it instead grabs a snippet from an organic result (which is cached and incorrect):

res.dig *[:organic_results, 0, :snippet]
"Current Local Time in Denver, Colorado, USA ; Colorado (CO) · 39°45'N / 104°59'W · 1593 m · United States Dollar (USD) · English."

While trying out queries for different types of realtime information, I've found that the answer box format can very quite a bit. I've actually had my best results by passing the answer_box back to the engine and asking it to parse the JSON.

For example the results below are all correct but fail w/the GoogleSearch boxcar:

search_boxcar = Boxcars::GoogleSearch.new
engine = Boxcars::Openai.new
q = "What is the current time in fort collins?"
search = ::GoogleSearch.new(q: q)
rv = search.get_hash
engine.run("can you answer: #{q} given the JSON below? #{rv.dig(:answer_box).to_json[0..500]}")
 => "The current time in Fort Collins is 8:48 AM on Sunday, May 7, 2023 (MDT)." 
search_boxcar.run(q)
Answer: {:snippet=>"Current Local Time in Fort Collins, Colorado, USA ; Colorado (CO) · 40°35'N / 105°05'W · 1522 m · United States Dollar (USD) · English.", :url=>"https://www.timeanddate.com/worldclock/usa/fort-collins"}

q = "What is the current value of the s & p 500?"
search = ::GoogleSearch.new(q: q)
rv = search.get_hash
engine.run("can you answer: #{q} given the JSON below? #{rv.dig(:answer_box).to_json[0..500]}")
 => "The current value of the S&P 500 is 4136.25." 
search_boxcar.run(q)
=> "The Standard and Poor's 500, or simply the S&P 500, is a stock market index tracking the stock performance of 500 of the largest companies listed on stock exchanges in the United States. It is one of the most commonly followed equity indices." 

q = "What is the current temp in fort collins?"
search = ::GoogleSearch.new(q: q)
rv = search.get_hash
engine.run("can you answer: #{q} given the JSON below? #{rv.dig(:answer_box).to_json[0..1000].gsub("%","%%")}")
 => "The current temperature in Fort Collins is 53 degrees Fahrenheit." 
search_boxcar.run(q)
{:snippet=>"Fort Collins, CO Weather Conditionsstar_ratehome ; Temperature. High. 73 · F · 67 ; Rain/Snow Depth. Precipitation. 0 · in. 4.5 ; Temperature. High. 73 · F · 67 ...",
 :url=>"https://www.wunderground.com/weather/us/co/fort-collins/80521"} 

q = "What is the weather tomorrow in fort collins?"
search = ::GoogleSearch.new(q: q)
rv = search.get_hash
engine.run("can you answer: #{q} given the JSON below? #{rv.dig(:answer_box).to_json[0..1000].gsub("%","%%")}")
 => "The weather tomorrow in Fort Collins is partly cloudy with a high of 73°F and a low of 43°F." 
search_boxcar.run(q)
{:snippet=>"Partly cloudy skies. Low 42F. Winds N at 5 to 10 mph. Humidity55%.",
 :url=>"https://weather.com/weather/tenday/l/Fort+Collins+CO?canonicalCityId=58e8622d9026d4dc25f02c1f57faeb6ee86cc124642fb29098291a78a4062e0f"} 

q = "what is rivian's market cap?"
search = ::GoogleSearch.new(q: q)
rv = search.get_hash
engine.run("can you answer: #{q} given the JSON below? #{rv.dig(:answer_box).to_json[0..1000].gsub("%","%%")}")
 => "Rivian's market cap is $12,430,000,000." 
search_boxcar.run(q)
=> "Rivian Automotive, Inc., is an American electric vehicle manufacturer and automotive technology company founded in 2009. Rivian is building an electric sport utility vehicle and pickup truck on a \"skateboard\" platform that can support future vehicles or be adopted by other companies." 

Curious if you have thoughts on this? I've thought of creating new Boxcar, for example, GoogleAnswerBox that is dedicated to just this function.

francis commented 1 year ago

@itsderek23 - the logic for the Google Search is pretty simplistic. It is just looking for a key pair and then returns the first one present. Adding multiple to pick from and then having an LLM pick sounds like a good idea. Feel free to put up a PR, or leave this be and I will get to it eventually. It does sound more useful.

itsderek23 commented 1 year ago

Sounds good - I'll play w/this a bit and share what I come up with.