Common-SenseMakers / sensemakers

Sensemakers infrastructure for developing AI-based tools for semantic annotations of social posts. Cross-poster app to publish your semantic posts on different networks.
GNU General Public License v3.0
1 stars 2 forks source link

Error in batch parsing #79

Closed ShaRefOh closed 2 months ago

ShaRefOh commented 4 months ago

Got the following error using gpt-3.5 and anthropic

2024-05-09 21:45:21.312 | DEBUG    | desci_sense.shared_functions.parsers.multi_chain_parser:batch_process_ref_posts:245 - Invoking parallel chain...
 19%|███████████████████▉                                                                                  | 91/467 [00:23<01:34,  3.99it/s]Traceback (most recent call last):
  File "/Users/shaharorielkagan/sensemakers/nlp/desci_sense/evaluation/mulitchain_filter_evaluation.py", line 258, in <module>
    pred_labels(df=df,config=config)
  File "/Users/shaharorielkagan/sensemakers/nlp/desci_sense/evaluation/mulitchain_filter_evaluation.py", line 90, in pred_labels
    results = model.batch_process_ref_posts(inputs=inputs,active_list=["keywords", "topics"],batch_size=10)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shaharorielkagan/sensemakers/nlp/desci_sense/shared_functions/parsers/multi_chain_parser.py", line 247, in batch_process_ref_posts
    results = asyncio.run(
              ^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 738, in abatch
    return await gather_with_concurrency(configs[0].get("max_concurrency"), *coros)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/utils.py", line 67, in gather_with_concurrency
    return await asyncio.gather(*(gated_coro(semaphore, c) for c in coros))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/utils.py", line 49, in gated_coro
    return await coro
           ^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 735, in ainvoke
    return await self.ainvoke(input, config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3174, in ainvoke
    results = await asyncio.gather(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2536, in ainvoke
    input = await step.ainvoke(
            ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3174, in ainvoke
    results = await asyncio.gather(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2536, in ainvoke
    input = await step.ainvoke(
            ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 179, in ainvoke
    llm_result = await self.agenerate_prompt(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 570, in agenerate_prompt
    return await self.agenerate(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 530, in agenerate
    raise exceptions[0]
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 715, in _agenerate_with_cache
    result = await self._agenerate(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 649, in _agenerate
    response = await self.async_client.create(messages=message_dicts, **params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1161, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1782, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1485, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1578, in _request
    return await self._process_response(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1670, in _process_response
    return await api_response.parse()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_response.py", line 417, in parse
    parsed = self._parse(to=to)
             ^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_response.py", line 251, in _parse
    data = response.json()
           ^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/httpx/_models.py", line 761, in json
    return jsonlib.loads(self.content, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 207 column 1 (char 1133)
ShaRefOh commented 4 months ago

Might have something to do with batch size, it was set to 10 for the other models. It worked with 5.

ShaRefOh commented 4 months ago

Ok gpt-3.5 still raise the same error

ronentk commented 4 months ago

@ShaRefOh I pushed a potential fix to nlp-dev - feel free to try again and let me know if there are more issues. The implications of the fix are that the parser may return a failure for a given input, but still continue to the rest of the batch. Next I'll implement a monitoring of how many times this actually happens #82

ShaRefOh commented 4 months ago

@ronentk can you also catch these:

ValueError                                Traceback (most recent call last)
Cell In[33], line 2
      1 # batch process
----> 2 results = multi_chain_parser.batch_process_ref_posts(inputs=inputs,active_list=["keywords", "topics"],batch_size=10)

File ~/sensemakers/nlp/notebooks/../desci_sense/shared_functions/parsers/multi_chain_parser.py:247, in MultiChainParser.batch_process_ref_posts(self, inputs, batch_size, active_list)
    243 parallel_chain = self.create_parallel_chain(active_list)
    245 logger.debug("Invoking parallel chain...")
--> 247 results = asyncio.run(
    248     parallel_chain.abatch(
    249         inst_prompts,
    250         config=config,
    251     )
    252 )
    253 cb.progress_bar.close()
    255 # post processing results

File ~/Library/Python/3.11/lib/python/site-packages/nest_asyncio.py:35, in _patch_asyncio.<locals>.run(main, debug)
     33 task = asyncio.ensure_future(main)
     34 try:
---> 35     return loop.run_until_complete(task)
     36 finally:
     37     if not task.done():
...
--> 574     raise ValueError(response.get("error"))
    576 for res in response["choices"]:
    577     message = _convert_dict_to_message(res["message"])

ValueError: {'message': 'OpenAI: GPT-3.5 Turbo 16k requires moderation. Your input was flagged for "harassment". No credits were charged.', 'code': 403, 'metadata': {'reasons': ['harassment'], 'flagged_input': 'user: \nYou are an expert annotator tasked with ass...mPost\ntitle: Twitter post\nsummary: None\n\n# Output:'}}

And return the error with the full prompt? I changed the prompt and for some and for some reason I still see the same message with the same metadata

ShaRefOh commented 4 months ago

Anyway, it will be good to catch these if an LLM moderation would not parse a tweet.

ronentk commented 4 months ago

hmm those should be caught as well, are you using the new version?

ronentk commented 4 months ago

btw I managed to run GPT3.5 on the dataset

ronentk commented 4 months ago

@ShaRefOh can we close this task?