ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI
https://scrapegraphai.com
MIT License
15.65k stars 1.27k forks source link

Not getting extraction results after upgrading from 1.6.1 to 1.18.1 #647

Closed anirudh1800 closed 2 months ago

anirudh1800 commented 2 months ago

Describe the bug Not getting extraction results after upgrading from 1.6.1 to 1.18.1

To Reproduce 1.6.1 config smart_scraper_graph_config = { "llm": { "api_key": "xxxxxx", "model": "gpt-3.5-turbo", }, "headless": True, "verbose": True }

INFO: 127.0.0.1:41338 - "POST /scrape HTTP/1.1" 200 OK --- Executing Fetch Node --- --- Executing Parse Node --- --- Executing RAG Node --- --- (updated chunks metadata) --- --- (tokens compressed and vector stored) --- --- Executing GenerateAnswer Node --- Processing chunks: 100%|███████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 6084.58it/s] INFO: 127.0.0.1:40700 - "POST /scrape HTTP/1.1" 200 OK image

1.18.1 config

smart_scraper_graph_config = { "llm": { "api_key": "xxxxxx", "model": "openai/gpt-3.5-turbo", }, "headless": True, "verbose": True }

INFO: Application startup complete. --- Executing Fetch Node --- --- (Fetching HTML from: https://www.jcrew.com/p/mens/categories/clothing/t-shirts/vintage/long-sleeve-vintage-wash-cotton-pocket-t-shirt/CB486?display=standard&fit=Classic&color_name=white&colorProductCode=CB486) --- --- Executing Parse Node --- --- Executing GenerateAnswer Node --- INFO: 127.0.0.1:58388 - "POST /scrape HTTP/1.1" 200 OK image

Price of the item from input url not extracted

goasleep commented 2 months ago

Guess this web has timeout.I open the website and it load slowly. You can try to disable timeout and try again. @anirudh1800

smart_scraper_graph_config = {
    "llm": {
        "api_key": "xxxxxx",
        "model": "gpt-3.5-turbo",
    },
    "headless": True,
    "verbose": True,
    "timeout": 0, # disable timeout
}