ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI
https://scrapegraphai.com
MIT License
14.36k stars 1.17k forks source link

Ability to add headers to source ( better integration with Jina AI) #538

Open angelotc opened 1 month ago

angelotc commented 1 month ago

Is your feature request related to a problem? Please describe. We can assign a url to the source. It would be nice if we could also pass in an headers such as API key to use our jina ai credits.

Describe the solution you'd like Perhaps a source_config param?

    smart_scraper_graph = SmartScraperGraph(
        prompt="Find the yelp link, name, website, number of average yelp reviews , summary of yelp_reviews, specialties, phone,  and their website",
        source=f"https://s.jina.ai/{url_encoded_query}",
        source_config= {
            headers: {
                "Authorization": f"Bearer {JINA_API_KEY}"
            }
        },
        config=graph_config,
        schema = Contractor
    )

Describe alternatives you've considered n/a

VinciGit00 commented 1 month ago

Why you want embeddings? I never noticed about that and as I can see from the website Juba is used for embeddings

angelotc commented 1 month ago

Why you want embeddings? I never noticed about that and as I can see from the website Juba is used for embeddings

Less so about embeddings - moreso of the fact that they offer free / low-cost web search and LLM friendly site data.

https://jina.ai/reader/ https://www.youtube.com/watch?v=QxHE4af5BQE&t=1035s