cohere-ai / sandbox-grounded-qa

A sandbox repo for grounded question answering with Cohere and Google Search
MIT License
135 stars 17 forks source link

traceback -- too many tokens in prompt #10

Open saraswat opened 1 year ago

saraswat commented 1 year ago

My question was simple:

What is the GLOBEX Code for Crude Oil Futures on the Chicago Mercantile Exchange?

I get a traceback:

https://serpapi.com/search
Traceback (most recent call last):
  File "/Users/vijaysaraswat/Documents/code/sandbox-grounded-qa/cli_demo.py", line 26, in <module>
    reply = bot.answer(question, verbosity=args.verbosity, n_paragraphs=2)
  File "/Users/vijaysaraswat/Documents/code/sandbox-grounded-qa/qa/bot.py", line 42, in answer
    answer_text, source_urls, source_texts = answer_with_search(question,
  File "/Users/vijaysaraswat/Documents/code/sandbox-grounded-qa/qa/answer.py", line 83, in answer_with_search
    response = answer(question, context, co, chat_history=chat_history, model=model)
  File "/Users/vijaysaraswat/Documents/code/sandbox-grounded-qa/qa/answer.py", line 39, in answer
    prediction = co.generate(model=model,
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/site-packages/cohere/client.py", line 115, in generate
    response = self.__request(json_body, cohere.GENERATE_URL)
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/site-packages/cohere/client.py", line 289, in __request
    raise CohereError(message=res['message'], http_status=response.status_code, headers=response.headers)
cohere.error.CohereError: too many tokens: total number of tokens (prompt and prediction) cannot exceed 2048 - received 2954. Try using a shorter prompt or a smaller max_tokens value.
Exception ignored in: <function Pool.__del__ at 0x10d690160>
Traceback (most recent call last):
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/queues.py", line 378, in put
    self._writer.send_bytes(obj)
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor
Exception ignored in: <function Pool.__del__ at 0x10d690160>
Traceback (most recent call last):
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/queues.py", line 378, in put
    self._writer.send_bytes(obj)
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/Users/vijaysaraswat/anaconda3/envs/py310/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

Looks like a bug in extracting the text from the webpage...?

saraswat commented 1 year ago

Interestingly, changing the question to:

What is the GLOBEX Code for Crude Oil Futures and Options on the Chicago Mercantile Exchange?

got back a decent answer:

The GLOBEX code for crude oil futures and options on the Chicago Mercantile Exchange is CL.
michaelwechner commented 1 year ago

It works for me when running my own instance and I receive the following answer

The GLOBEX code for Crude Oil Futures on the Chicago Mercantile Exchange is CME.

Source:
https://www.schwab.com/futures/crude-oil

Relevant texts:
Crude oil futures are 1,000 barrels per contract, traded from 6:00 p.m. U.S. until 5:00 p.m. U.S. ET, all months of the year. However, you can trade more than just NYMEX crude oil futures online with Schwab. We also offer Brent crude oil futures as well as E-mini crude oil futures, which are just 50% of the size of a standard futures contract. E-mini crude futures trade exclusively on the Chicago Mercantile Exchange's Globex® platform nearly 24 hours per day.  Crude oil futures on the New York Mercantile Exchange (NYMEX) are the world's most actively traded futures contract on a physical commodity. Because of its excellent liquidity and price transparency, the contract is used as a principal international pricing benchmark. The NYMEX also offers trading in heating oil futures and gasoline futures.

whereas I receive some timeouts on the way

URL from SerpAPI/Google: https://www.cmegroup.com/markets/energy/crude-oil/light-sweet-crude.html
URL from SerpAPI/Google: https://www.schwab.com/futures/crude-oil
URL from SerpAPI/Google: https://www.interactivebrokers.com/en/trading/cme-wti-futures.php
URL from SerpAPI/Google: https://www.dormantrading.com/TraderTools/en-153_wti_brochure_sr.pdf
Try to load data from 'https://www.cmegroup.com/markets/energy/crude-oil/light-sweet-crude.html' ...
Open link 'https://www.cmegroup.com/markets/energy/crude-oil/light-sweet-crude.html' ...
Try to load data from 'https://www.schwab.com/futures/crude-oil' ...
Open link 'https://www.schwab.com/futures/crude-oil' ...
Try to load data from 'https://www.interactivebrokers.com/en/trading/cme-wti-futures.php' ...
Open link 'https://www.interactivebrokers.com/en/trading/cme-wti-futures.php' ...
Try to load data from 'https://www.dormantrading.com/TraderTools/en-153_wti_brochure_sr.pdf' ...
Open link 'https://www.dormantrading.com/TraderTools/en-153_wti_brochure_sr.pdf' ...
Extract content from HTML '<http.client.HTTPResponse object at 0x110b06400>' ...
Extract content from HTML '<http.client.HTTPResponse object at 0x114bd26a0>' ...
Timeout Error!
Extract content from HTML '<http.client.HTTPResponse object at 0x11dd436a0>' ...
Timeout Error!

Can you reproduce the error?

saraswat commented 1 year ago

Well, now its been gone over two minutes, seems to be stuck!

question: What is the GLOBEX Code for Crude Oil Futures on the Chicago Mercantile Exchange?
https://serpapi.com/search
answer: CME WTI Crude Oil Futures
Source:
https://www.schwab.com/futures/crude-oil
https://www.theice.com/products/213/WTI-Crude-Futures
question: Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
saraswat commented 1 year ago

Five minutes now, definitely stuck.

So two errors: the traceback above which appears to indicate some check is not being made for the size of the prompt before the query is sent in. And some deadlock / infinite loop.

saraswat commented 1 year ago

I see -- the "stuck" problem is simply an artifact of some kind of buffering bug -- the output of the program "Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER." appears on the command line after the input prompt (question:) leading me to believe that it is still working.

FWIW, I am running this under emacs and not a shell command line.