Closed nashugame closed 5 months ago
please try this configuration:
graph_config = {
"llm": {
"model": "groq/llama3-8b-8192",
"api_key": groq_key,
"temperature": 0,
"format": "json"
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": base_url, # set Ollama URL
},
"headless": False
}
Hi @VinciGit00 , I am still getting the same error with your syggested configuration. I am attaching the logs for your reference
2024-06-02 17:52:11 - Loaded .env file
2024-06-02 17:52:14 - Your app is available at http://localhost:8000
2024-06-02 17:52:16 - Translated markdown file for en-US not found. Defaulting to chainlit.md.
2024-06-02 17:55:22 - 1 change detected
2024-06-02 17:55:22 - File modified: main.py. Reloading app...
2024-06-02 17:55:24 - Translated markdown file for en-US not found. Defaulting to chainlit.md.
Give me a summary of top 10 advertising agencies
https://www.sortlist.com/
2024-06-02 17:56:12 - Starting scraping...
2024-06-02 17:56:18 - Content scraped
2024-06-02 17:56:27 - Loading faiss.
2024-06-02 17:56:27 - Successfully loaded faiss.
2024-06-02 17:56:37 - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-02 17:56:38 - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-02 17:56:39 - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-02 17:56:39 - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-02 17:56:39 - Invalid json output: Here is the JSON output:
{
"data": [
{
"url": "https://www.sortlist.com/recording",
"category": "recording"
},
{
"url": "https://www.sortlist.com/audio-mastering",
"category": "audio-mastering"
},
{
"url": "https://www.sortlist.com/design",
"category": "design"
},
...
]
}
Note that I've only included the first few items in the list. If you'd like me to continue processing the rest of the list, please let me know!
Traceback (most recent call last):
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/output_parsers/json.py", line 66, in parse_result
return parse_json_markdown(text)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/utils/json.py", line 147, in parse_json_markdown
return _parse_json(json_str, parser=parser)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/utils/json.py", line 160, in _parse_json
return parser(json_str)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/utils/json.py", line 120, in parse_partial_json
return json.loads(s, strict=strict)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/json/__init__.py", line 359, in loads
return cls(**kw).decode(s)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 15 column 5 (char 306)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/chainlit/utils.py", line 40, in wrapper
return await user_function(**params_values)
File "/Users/satyamkumar/development/pocs/python/webscraper-scrapegraph/test.py", line 64, in main
result = user_scrapper_graph.run()
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 118, in run
self.final_state, self.execution_info = self.graph.execute(inputs)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/graphs/base_graph.py", line 171, in execute
return self._execute_standard(initial_state)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/graphs/base_graph.py", line 110, in _execute_standard
result = current_node.execute(state)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/nodes/generate_answer_node.py", line 124, in execute
answer = map_chain.invoke({"question": user_prompt})
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3142, in invoke
output = {key: future.result() for key, future in zip(steps, futures)}
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3142, in <dictcomp>
output = {key: future.result() for key, future in zip(steps, futures)}
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2499, in invoke
input = step.invoke(
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/output_parsers/base.py", line 169, in invoke
return self._call_with_config(
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 1626, in _call_with_config
context.run(
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/runnables/config.py", line 347, in call_func_with_variable_args
return func(input, **kwargs) # type: ignore[call-arg]
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/output_parsers/base.py", line 170, in <lambda>
lambda inner_input: self.parse_result(
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/langchain_core/output_parsers/json.py", line 69, in parse_result
raise OutputParserException(msg, llm_output=text) from e
langchain_core.exceptions.OutputParserException: Invalid json output: Here is the JSON output:
{
"data": [
{
"url": "https://www.sortlist.com/recording",
"category": "recording"
},
{
"url": "https://www.sortlist.com/audio-mastering",
"category": "audio-mastering"
},
{
"url": "https://www.sortlist.com/design",
"category": "design"
},
...
]
}
Note that I've only included the first few items in the list. If you'd like me to continue processing the rest of the list, please let me know!
This happens all the time. It's the LLM outputting an invalid JSON file because it adds phrases and/or suspension dots within the code. It's a recurring issue when working with LLMs, especially with smaller models like the llama3-8b you're using. There's not much that can be done.
Let's take a look at the output from your first log.
Here is the JSON output:
{
"data": [
{
"url": "https://www.sortlist.com/recording",
"category": "recording"
},
{
"url": "https://www.sortlist.com/audio-mastering",
"category": "audio-mastering"
},
{
"url": "https://www.sortlist.com/design",
"category": "design"
},
...
]
}
It literally wrote "Here's the JSON output:" within the JSON file, and added suspension dots after the last element. You can see something even worse on the second output, too, where it wrote "Note that I've only included the first few items in the list. If you'd like me to continue processing the rest of the list, please let me know!" at the end. This model was clearly trained to be a chatbot and it can't resist the temptation to talk too much, even if the system prompt provided by ScrapeGraph is very clear on only outputting the JSON.
Sometimes you can work around the problem by giving a less declarative, more descriptive prompt, but it's not guaranteed. In your case, "Summary of top 10 advertising agencies" instead of "Give me a summary of top 10 advertising agencies" might do the trick. If this doesn't work either, you might have to use a different LLM.
Hi, please try with the new beta
Hey @nashugame created a new issue #332 from discussion to use Pydantic schema validation. It will also be up to the size of the model but feel free to contribute!
Hi @VinciGit00 Getting this with new beta
2024-06-03 16:22:15 - "Groq" object has no field "format"
Traceback (most recent call last):
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/chainlit/utils.py", line 40, in wrapper
return await user_function(**params_values)
File "/Users/satyamkumar/development/pocs/python/webscraper-scrapegraph/main.py", line 47, in on_chat_start
smart_scraper_graph = SmartScraperGraph(
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 52, in __init__
super().__init__(prompt, config, source, schema)
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/graphs/abstract_graph.py", line 81, in __init__
self.graph = self._create_graph()
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 85, in _create_graph
generate_answer_node = GenerateAnswerNode(
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/scrapegraphai/nodes/generate_answer_node.py", line 48, in __init__
self.llm_model.format="json"
File "/opt/miniconda3/envs/source-x-ai/lib/python3.10/site-packages/pydantic/v1/main.py", line 357, in __setattr__
raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')
ValueError: "Groq" object has no field "format"
hi, the main problem is the model you are using, please use another one, maybe with Ollama
Describe the bug I am using SmartScraper graph to scrape data from a website. It is giving me Invalid JSON output error.
To Reproduce This is my graph_config, for the rest of code I am following the tutorial. I using latest release fo ScrapeGraphAI. The website source: https://www.sortlist.com/ prompt: Give me a summary of top 10 advertising agencies
Screenshots
Desktop (please complete the following information):