ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI
https://scrapegraphai.com
MIT License
15.76k stars 1.28k forks source link

smart_scraper_graph.run(): 'FetchNode' object has no attribute 'update_state' #703

Closed tokumotion closed 1 month ago

tokumotion commented 1 month ago

Describe the bug Using a script to get keywords from a document, today it started to fail

To Reproduce Run this code

graph_config = {
   "llm": {
      "api_key": OPENAI_API_KEY,
      "model": "openai/gpt-4o",
   },
}

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the theme keywords in this text",
    source=role,
    config=graph_config,
)

themes = smart_scraper_graph.run()

Expected behavior Should get themes which is a list of keywords from a text

Screenshots This is the error I get

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-6-c7e0a4169c75>](https://localhost:8080/#) in <cell line: 49>()
     47 )
     48 
---> 49 themes = smart_scraper_graph.run()
     50 
     51 query = '''Talk about your experience in similar roles and mention why you're a good fit for this (3rd person)

5 frames
[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/smart_scraper_graph.py](https://localhost:8080/#) in run(self)
    112 
    113         inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 114         self.final_state, self.execution_info = self.graph.execute(inputs)
    115 
    116         return self.final_state.get("answer", "No answer found.")

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/base_graph.py](https://localhost:8080/#) in execute(self, initial_state)
    256             return (result["_state"], [])
    257         else:
--> 258             return self._execute_standard(initial_state)
    259 
    260     def append_node(self, node):

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/base_graph.py](https://localhost:8080/#) in _execute_standard(self, initial_state)
    177                         exception=str(e)
    178                     )
--> 179                     raise e
    180                 node_exec_time = time.time() - curr_time
    181                 total_exec_time += node_exec_time

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/base_graph.py](https://localhost:8080/#) in _execute_standard(self, initial_state)
    161             with self.callback_manager.exclusive_get_callback(llm_model, llm_model_name) as cb:
    162                 try:
--> 163                     result = current_node.execute(state)
    164                 except Exception as e:
    165                     error_node = current_node.node_name

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/nodes/fetch_node.py](https://localhost:8080/#) in execute(self, state)
    126             return state
    127         elif not source.startswith("http"):
--> 128             return self.handle_local_source(state, source)
    129         else:
    130             return self.handle_web_source(state, source)

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/nodes/fetch_node.py](https://localhost:8080/#) in handle_local_source(self, state, source)
    229         ]
    230 
--> 231         return self.update_state(state, compressed_document)
    232 
    233     def handle_web_source(self, state, source):

AttributeError: 'FetchNode' object has no attribute 'update_state'

Desktop (please complete the following information): Using Google Colab

VinciGit00 commented 1 month ago

Please look on the collab present in the readme

VinciGit00 commented 1 month ago

Collab need a different config

tokumotion commented 1 month ago

Hi @VinciGit00, used the colab setup you pointed but got the same error message

JoshuaTruscott commented 1 month ago

I'm getting the same error running locally using SmartScraperMultiGraph. Fresh environment following readme.

Edit: Got it working. My bad, I was using a string for source instead of a list.

mrBlackHat777 commented 1 month ago

Hi @JoshuaTruscott,

The source parameter needs to be a str and not a list:

def __init__(self, prompt: str, source: str, config: dict, schema: Optional[BaseModel] = None): I am facing the same issue.