ClericPy / ichrome

Chrome controller for Humans, based on Chrome Devtools Protocol(CDP) and python3.7+.
https://pypi.org/project/ichrome/
MIT License
228 stars 29 forks source link

How do you take a screen shot using ichrome... #39

Closed welcomemat-services closed 3 years ago

welcomemat-services commented 3 years ago

I want to take a screenshot of whatever on the page using this module. Currently, I am using pyppeteer. My plan is to use this from FastApi.

Any help is appreciated. Thank you.

ClericPy commented 3 years ago

Hi, @welcomemat-services

I had a plan to use ichrome with fastapi but the code is still not ready.

So, you can do like this, but the websockets of Chrome CDP is not very stable and then need restart chrome daemon each 8 mins.

Here is the demo.

from fastapi import FastAPI
from ichrome import AsyncChromeDaemon
from starlette.responses import Response
from base64 import b64decode
app = FastAPI()

@app.on_event("startup")
async def startup_event():
    app.chrome = await AsyncChromeDaemon().__aenter__()

@app.on_event("shutdown")
async def shutdown_event():
    await app.chrome.__aexit__()

@app.get('/screenshot')
async def screenshot(url):
    async with app.chrome.connect_tab(index=None, auto_close=True) as tab:
        await tab.set_url(url, timeout=10)
        image = await tab.screenshot()
        return Response(b64decode(image or b''))

if __name__ == "__main__":
    from uvicorn import run
    run(app)

Open your browser and input http://127.0.0.1:8000/screenshot?url=https://www.quora.com

welcomemat-services commented 3 years ago

@ClericPy Thank you so much for the code.

"but the websockets of Chrome CDP is not very stable and then need restart chrome daemon each 8 mins."

This code is going to be on a website on Heroku, so how do we deal with restarting the daemon?

Thank you.

ClericPy commented 3 years ago
from fastapi import FastAPI
from ichrome import AsyncChromeDaemon
from starlette.responses import Response
from base64 import b64decode
import asyncio
app = FastAPI()

async def restart_job():
    while 1:
        if getattr(app, 'chrome', None):
            await app.chrome.shutdown()
        app.chrome = await AsyncChromeDaemon().__aenter__()
        await asyncio.sleep(8 * 60)

@app.on_event("startup")
async def startup_event():
    asyncio.ensure_future(restart_job())

@app.get('/screenshot')
async def screenshot(url):
    async with app.chrome.connect_tab(index=None, auto_close=True) as tab:
        await tab.set_url(url, timeout=10)
        image = await tab.screenshot()
        return Response(b64decode(image or b''))

if __name__ == "__main__":
    from uvicorn import run
    run(app)
# http://127.0.0.1:8000/screenshot?url=https://bing.com

A new demo, running Chrome process for long time will bring very many issues like: memory leak, websocket miss, cache overflow, and other problems, and what I given is only a simple demo.

PS: AsyncChromeDaemon() should set headless=True on server linux.

welcomemat-services commented 3 years ago

@ClericPy Thank you for taking time to help me with this.

I am trying to take a screenshot of facebook ad preview url, but it seems to be giving me a blank screen. I tried with other websites and it is loading alright with those urls, but not with the following:

https://www.facebook.com/ads/api/preview_iframe.php?d=AQJEfPAy4afAB60wOmfaUWylbt8x1HH0G5-d1GBmoVf8tSLkRTRZiU1qJQFtbm0OC3pCB_MrYyC9Pube0sfAiGaK5g2cpbYgd1-QCPTQrJyDGG6pVIZW06ETtjy76xuF3x9S62Flq4HM4z8o8t2ubqZwjNp7IibhPnce3riTqj4_x7UwUjkDFvkIRr0TfhNStRiM-KC8z5YF3jfLvohCWXeH5Ncv6-1_KxmDUo_nBBHMfKqzPt7kP0dz5cGX6OsLWvtNQklJTOBcVQ8rm4XXVBe8lGO9iq_9RogAXuuA8ePlqaZ-1Rt7uCmbTFWR_6l0PNzzTD8YL87DRLmjmkg9UecY&t=AQKo6Fq5kHbz5Mf6WNA

Any insights will help me greatly.

Thanks.

ClericPy commented 3 years ago

I didn't find any errors for that url.

http://127.0.0.1:8000/screenshot?url=https%3A%2F%2Fwww.facebook.com%2Fads%2Fapi%2Fpreview_iframe.php%3Fd%3DAQJEfPAy4afAB60wOmfaUWylbt8x1HH0G5-d1GBmoVf8tSLkRTRZiU1qJQFtbm0OC3pCB_MrYyC9Pube0sfAiGaK5g2cpbYgd1-QCPTQrJyDGG6pVIZW06ETtjy76xuF3x9S62Flq4HM4z8o8t2ubqZwjNp7IibhPnce3riTqj4_x7UwUjkDFvkIRr0TfhNStRiM-KC8z5YF3jfLvohCWXeH5Ncv6-1_KxmDUo_nBBHMfKqzPt7kP0dz5cGX6OsLWvtNQklJTOBcVQ8rm4XXVBe8lGO9iq_9RogAXuuA8ePlqaZ-1Rt7uCmbTFWR_6l0PNzzTD8YL87DRLmjmkg9UecY%26t%3DAQKo6Fq5kHbz5Mf6WNA

welcomemat-services commented 3 years ago

@ClericPy Thank you for taking time to look into this. I think is is something to do with timeout. I tried various timeouts but still could not produce the actual preview, only a blank image with the outline.

When I run the above code, I am getting the below screen: image

Where as the actual preview looks like this: image

ClericPy commented 3 years ago

I know what happened to your case.

That page loading is not the main page's event. So you can use ichrome's wait_tag method for this usage.

from fastapi import FastAPI
from ichrome import AsyncChromeDaemon
from starlette.responses import Response
from base64 import b64decode
import asyncio
app = FastAPI()

async def restart_job():
    while 1:
        if getattr(app, 'chrome', None):
            await app.chrome.shutdown()
        app.chrome = await AsyncChromeDaemon(headless=True).__aenter__()
        await asyncio.sleep(8 * 60)

@app.on_event("startup")
async def startup_event():
    asyncio.ensure_future(restart_job())

@app.get('/screenshot')
async def screenshot(url):
    async with app.chrome.connect_tab(index=None, auto_close=True) as tab:
        await tab.set_url(url)
        await tab.wait_tag('[data-testid="Keycommand_wrapper_feed_story"]')
        # await asyncio.sleep(5)
        image = await tab.screenshot()
        return Response(b64decode(image or b''))

if __name__ == "__main__":
    from uvicorn import run
    run(app)
# http://127.0.0.1:8000/screenshot?url=https%3A%2F%2Fwww.facebook.com%2Fads%2Fapi%2Fpreview_iframe.php%3Fd%3DAQJEfPAy4afAB60wOmfaUWylbt8x1HH0G5-d1GBmoVf8tSLkRTRZiU1qJQFtbm0OC3pCB_MrYyC9Pube0sfAiGaK5g2cpbYgd1-QCPTQrJyDGG6pVIZW06ETtjy76xuF3x9S62Flq4HM4z8o8t2ubqZwjNp7IibhPnce3riTqj4_x7UwUjkDFvkIRr0TfhNStRiM-KC8z5YF3jfLvohCWXeH5Ncv6-1_KxmDUo_nBBHMfKqzPt7kP0dz5cGX6OsLWvtNQklJTOBcVQ8rm4XXVBe8lGO9iq_9RogAXuuA8ePlqaZ-1Rt7uCmbTFWR_6l0PNzzTD8YL87DRLmjmkg9UecY%26t%3DAQKo6Fq5kHbz5Mf6WNA
welcomemat-services commented 3 years ago

@ClericPy,

It worked!

Thank you for spending your valuable time with me and it is fast as well. I cannot wait to test it on Heroku platform.

Thank you so much.

ClericPy commented 3 years ago

@welcomemat-services You're welcome, I would like to learn more weird knowledge between communications.

Then you can close it if you fix this issue.

Happy to you.

PS: screenshot method could be instead of screenshot_element for accurate range

welcomemat-services commented 3 years ago

Thank you,

I will experiment with screenshot_element as well as clipping the image to a certain size.

welcomemat-services commented 3 years ago

@ClericPy,

Looks like facebook needs users login before the ad preview can be viewed. The link that I have posted above is an old ad and the below is a new ad preview for a latest ad and this link is not loading unless I login before accessing the ad preview.

Now how do I login to facebook using iChrome?

https://www.facebook.com/ads/api/preview_iframe.php?d=AQK29HUTQSXcmhUUQHfLOjKjNoJv0gbKO86fKnLRo-H7Pd84r5tB0KLI3-m78fSX6fY4oINytIfg32nM0UplbTKT6dD_hntnk66lzJbHxJ5ox7w7eoiWOd1AjOrirbLyT5twbagBULrimnVwz-pHxzQH2U1iiTqahArqwZZS0-fu6171OoxI2WL4iVmKTxv1OLP5yKomSAXdRrutHY9FtLFrHkJhlt6iiHNJwwSYLac927a9eOteqLY3BfcZzO1g4qnvQ4zO8iq-4bQMEHPzheec1UgOAAheRXlBsLnytqk0uS9pAH_5cRcXY9Xgm2zZ0d-XtO2OAEDRhsurCPtq1lgQ&t=AQKU0nE_04_kR7phIzo

ClericPy commented 3 years ago

I do not have access for this page:

Errors while executing operation \"CometDesktopFeedAdPreviewWrapperQuery\": At Query.ad_preview: Field implementation threw an exception. Check your server logs for more information.

And you can use tab.set_cookies to insert your facebook cookie to login it.

welcomemat-services commented 3 years ago

@ClericPy,

How do you set fields such as email, password and click a button using iChrome?

Is there any documentation that I can look into to see what is available to achieve this? I am trying to login to Facebook using email and password.

Pyppeteer has page.type() method which allows us to set form fields based on Id #email #pass and click on the #loginbutton on facebook.com/login page.

Once I login, I can stash the cookies and use them for next time login to facebook.

I am looking for a similar feature in iChrome.

Thank you.

ClericPy commented 3 years ago

I used to login facebook is using cookie because it do not care about the security service such as reCAPTCHA in google.

If you want to input some text, you can watch the tab.click and tab.keyboard_send usage over here https://github.com/ClericPy/ichrome/blob/master/examples_async.py

even using raw Javascript is also a better choice (tab.js method).

ClericPy commented 3 years ago

Javascript is always chrome's ace in the hole, just like VBA to Excel.

welcomemat-services commented 3 years ago

@ClericPy,

I got the cookies and loading them like below. Still fb does not load the image preview. I hope I am doing the cookie setting right.

Also I tried with headless=False.

async with browser.connect_tab(index=None, auto_close=True) as tab:
    with open('./cookies.json', 'r') as cookies:
        cookies_dict = json.loads(cookies.read())
        for cookie in cookies_dict:
            await tab.set_cookie(*cookie)

    await tab.set_url(ad_preview_url)
    print(await tab.get_html())
    await tab.wait_tag('[data-testid="Keycommand_wrapper_feed_story"]')
    # await asyncio.sleep(5)
    image = await tab.screenshot(clip={'x':0, 'y':0, 'height': height, 'width': width, 'scale': 1})
    await tab.close()
    b64 = b64decode(image or b'')
ClericPy commented 3 years ago

I used await tab.set_cookie(**cookie) for cookie dict, so make sure your cookie format is right, and cookie domain is more than .facebook.com`