ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI
https://scrapegraphai.com
MIT License
15.81k stars 1.29k forks source link

Asynchronous Support in Scrapegraph-ai #711

Closed jasonshenj closed 1 month ago

jasonshenj commented 1 month ago

Hello Scrapegraph-ai developers,

I'm currently using Scrapegraph-ai within a FastAPI application and have encountered some issues with asynchronous operations. I'm unable to utilize the asynchronous capabilities effectively, and I'm seeking guidance on how to resolve this.

I've tried to implement asynchronous patterns, but I'm facing challenges.

VinciGit00 commented 1 month ago

Look at #694

goasleep commented 1 month ago

@jasonshenj you can consider starting a thread to execute like asyncio.to_thread or loop.run_in_executor

VinciGit00 commented 1 month ago

Hi @goasleep do you have idea to how modify asyncloader to make it a sync?

DiTo97 commented 1 month ago

@jasonshenj could you provide an example?

would help with reproducibility and help us better track down the root cause of your issues

goasleep commented 1 month ago

Hi @goasleep do you have idea to how modify asyncloader to make it a sync?

Why does this need to change? If do that you need to support both async and sync method

VinciGit00 commented 1 month ago

@goasleep I would like to have support for both

goasleep commented 1 month ago

@goasleep I would like to have support for both

I'm not sure what you want to.Could you help to give me an example?

Here are my previous understanding:


def run_async_or_sync(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        if asyncio.iscoroutinefunction(func):
            return asyncio.run(func(*args, **kwargs))
        else:
            return func(*args, **kwargs)
    return wrapper
jasonshenj commented 1 month ago

Thank you very much for your support and assistance. I have managed to resolve the issue I was facing. Initially, I encountered challenges with Playwright, as it was too time-consuming and unable to capture the desired content. To address this, I separated the data retrieval process, incorporated some anti-detection features, and optimized the speed. However, this led to some asynchronous issues.

After carefully reviewing your source code, I made adjustments by switching from the synchronous interfaces of LangChain that you were using to their asynchronous ones. This change has proven to be quite effective.

Once again, thank you for your help. Your project is truly impressive and has been a great learning experience for me.

jasonshenj commented 1 month ago

such as: answer = chain.invoke({"question": PROMPT_STR}) To answer = await chain.ainvoke({"question": PROMPT_STR})

VinciGit00 commented 1 month ago

Thank you, I've seen that we still use chain.invoke and I'm going to switch to the async one, have a nice day