Open dhruv-scogo opened 1 week ago
I also get this error with Azure
I am also facing this error now. Below is my code:
import os
import json
from typing import List
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from scrapegraphai.graphs import SmartScraperGraph
load_dotenv()
# Define the output schema for the graph
class FAQLink(BaseModel):
text: str = Field(description="The text of the link")
url: str = Field(description="The URL of the link")
class FAQCategory(BaseModel):
header: str = Field(description="The header of the FAQ category")
links: List[FAQLink] = Field(description="The list of links in this category")
class FAQStructure(BaseModel):
categories: List[FAQCategory] = Field(description="The list of FAQ categories")
# Initialize the model instances
llm_model_instance = AzureChatOpenAI(
openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment = os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"],
)
embedder_model_instance = AzureOpenAIEmbeddings(
openai_api_key = os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
openai_api_version = os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment = os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
)
graph_config = {
"llm": {"model_instance": llm_model_instance},
"embeddings": {"model_instance": embedder_model_instance}
}
# Create the SmartScraperGraph instance and run it
smart_scraper_graph = SmartScraperGraph(
prompt="Extract all FAQ categories, their headers, and the links (text and URL) within each category from the CIMB bank FAQ page",
source="https://www.cimb.com.my/en/personal/help-support/faq.html",
schema=FAQStructure,
config=graph_config
)
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))
Using the code provided in the Azure example from the documentation results in this error with versions between 1.7.0 and 1.8.0.
Using the code provided in the Azure example from the documentation results in this error with versions between 1.7.0 and 1.8.0.
So, we should install the version that is below 1.7.0?
@mingjun1120 I unfortunately haven't tried with versions below 1.7.0. I can't suggest any workaround for now. 😅
The official example for Azure seems very off to me for some reason. It's completely different from any other SmartScraper example for the other API providers. It's weird that the Langchain classes are accessed directly, skipping a piece of the usual Scrapegraph workflow.
Try building from this example (I edited the official example to give it the usual Scrapegraph-style structure, but I have no idea if it will work, because I don't have access to Azure to test it):
"""
Basic example of scraping pipeline using SmartScraper using Azure OpenAI Key
"""
import os
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
# required environment variable in .env
# AZURE_OPENAI_KEY
graph_config = {
"llm": {
"api_key": os.environ["AZURE_OPENAI_KEY"],
"model": "azure/gpt-3.5-turbo",
},
"verbose": True,
"headless": False
}
# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the titles",
source="https://sport.sky.it/nba?gr=www",
config=graph_config
)
smart_scraper_graph = SmartScraperGraph(
prompt="""List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
event_end_date, event_end_time, location, event_mode, event_category,
third_party_redirect, no_of_days,
time_in_hours, hosted_or_attending, refreshments_type,
registration_available, registration_link""",
# also accepts a string with the already downloaded HTML code
source="https://www.hmhco.com/event",
config=graph_config
)
result = smart_scraper_graph.run()
print(result)
# ************************************************
# Get graph execution info
# ************************************************
graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
Could be completely wrong, but if it works, we'll put it in place instead of the current example.
I have used an instance of Azure Ai gpt-4 and i got the error I have initialized my model like this :
please verify is "model_instance" supported by the Smart Scrapper Graph ?