ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI
https://scrapegraphai.com
MIT License
13.05k stars 994 forks source link

'SmartScraperGraph' object has no attribute 'model_token' #422

Open dhruv-scogo opened 1 week ago

dhruv-scogo commented 1 week ago

I have used an instance of Azure Ai gpt-4 and i got the error I have initialized my model like this :

azure_model = AzureChatOpenAI(
    openai_api_base=config.OPENAI_API_BASE,
    openai_api_version="2023-05-15",
    openai_api_key=config.OPENAI_API_KEY,
    openai_api_type="azure",
    model="gpt-4",
    temperature=0.0
)

graph_config = {
    "llm": {
        "model_instance": azure_model,
        "temperature": 0,
        "streaming": False,
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",
    },
}

please verify is "model_instance" supported by the Smart Scrapper Graph ?

boerninator commented 1 week ago

I also get this error with Azure

mingjun1120 commented 6 days ago

I am also facing this error now. Below is my code:

import os
import json
from typing import List
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from scrapegraphai.graphs import SmartScraperGraph

load_dotenv()

# Define the output schema for the graph
class FAQLink(BaseModel):
    text: str = Field(description="The text of the link")
    url: str = Field(description="The URL of the link")

class FAQCategory(BaseModel):
    header: str = Field(description="The header of the FAQ category")
    links: List[FAQLink] = Field(description="The list of links in this category")

class FAQStructure(BaseModel):
    categories: List[FAQCategory] = Field(description="The list of FAQ categories")

# Initialize the model instances
llm_model_instance = AzureChatOpenAI(
    openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment = os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"],
)

embedder_model_instance = AzureOpenAIEmbeddings(
    openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment = os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
)

graph_config = {
    "llm": {"model_instance": llm_model_instance},
    "embeddings": {"model_instance": embedder_model_instance}
}

# Create the SmartScraperGraph instance and run it
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract all FAQ categories, their headers, and the links (text and URL) within each category from the CIMB bank FAQ page",
    source="https://www.cimb.com.my/en/personal/help-support/faq.html",
    schema=FAQStructure,
    config=graph_config
)

result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))
marcantoinefortier commented 5 days ago

Using the code provided in the Azure example from the documentation results in this error with versions between 1.7.0 and 1.8.0.

mingjun1120 commented 5 days ago

Using the code provided in the Azure example from the documentation results in this error with versions between 1.7.0 and 1.8.0.

So, we should install the version that is below 1.7.0?

marcantoinefortier commented 5 days ago

@mingjun1120 I unfortunately haven't tried with versions below 1.7.0. I can't suggest any workaround for now. 😅

f-aguzzi commented 4 days ago

The official example for Azure seems very off to me for some reason. It's completely different from any other SmartScraper example for the other API providers. It's weird that the Langchain classes are accessed directly, skipping a piece of the usual Scrapegraph workflow.

Try building from this example (I edited the official example to give it the usual Scrapegraph-style structure, but I have no idea if it will work, because I don't have access to Azure to test it):

""" 
Basic example of scraping pipeline using SmartScraper using Azure OpenAI Key
"""

import os
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

# required environment variable in .env
# AZURE_OPENAI_KEY

graph_config = {
    "llm": {
        "api_key": os.environ["AZURE_OPENAI_KEY"],
        "model": "azure/gpt-3.5-turbo",
    },
    "verbose": True,
    "headless": False
}

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the titles",
    source="https://sport.sky.it/nba?gr=www",
    config=graph_config
)

smart_scraper_graph = SmartScraperGraph(
    prompt="""List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time, 
    event_end_date, event_end_time, location, event_mode, event_category, 
    third_party_redirect, no_of_days, 
    time_in_hours, hosted_or_attending, refreshments_type, 
    registration_available, registration_link""",
    # also accepts a string with the already downloaded HTML code
    source="https://www.hmhco.com/event",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

Could be completely wrong, but if it works, we'll put it in place instead of the current example.