HENNGE / arsenic

Async WebDriver implementation for asyncio and asyncio-compatible frameworks
Other
349 stars 52 forks source link

Auth proxy with user and password #137

Closed DiMiTriFrog closed 2 years ago

DiMiTriFrog commented 2 years ago

Now I'm using the proxy auth IP whitelist, but some ip providers only let use with username and password.

How could I use arsenic with proxy using username and password? Selenium have selenium-wire for this case.

dimaqq commented 2 years ago

🤷🏿 🤔 would that be proxy between arsenic and browser, or browser and the website?

DiMiTriFrog commented 2 years ago

The truth is that I don't really know what would be needed, but the end result would be that using arsenic you can access any web site through a proxy connection, like the option of : "--proxy-server:" but allowing to use user and password.

DiMiTriFrog commented 2 years ago

At this moment I have a Docker with the arsenic script. Now, i configured the Docker for using the proxy but arsenic isn't load any web. I made a test with python requests and the requests library is using the proxy, I don't know why the arsenic library is not loading anything.

DiMiTriFrog commented 2 years ago

Arsenic returns a black page -> <html><head></head><body></body></html>

dimaqq commented 2 years ago

I think this issue needs a MRE https://stackoverflow.com/help/minimal-reproducible-example

DiMiTriFrog commented 2 years ago

Well, I have a Dockerfile with proxy conf:

FROM public.ecr.aws/lambda/python:3.8
WORKDIR /app

COPY requirements.txt  .
RUN  pip3 install -r requirements.txt
COPY headless-chromium /opt/headless-chromium
RUN chmod 777 /opt/headless-chromium

ENV http_proxy http://urltoproxy:port
ENV https_proxy https://urltoproxy:port

CMD ["/app/app.handler"]

My docker is for lambda use, I have two sample code (using the same dockerfile), one of this works and the other doesn't works. The sample code that works is a simple request of python:

Code that works (requests)

import requests
ip = requests.get('https://api.ipify.org').text
print(ip) // Returns a ip of proxy.

Code that doesn't works (arsenic)

from requests.packages.urllib3.exceptions import InsecureRequestWarning
from arsenic import get_session, stop_session
from arsenic.browsers import Chrome
from arsenic.services import Chromedriver
import asyncio

async def arsenic_simple():
    results_json = {}
    try:
        browser  = Chrome()
        browser.capabilities = {"goog:chromeOptions":  {"binary":"/opt/headless-chromium","args": ["--headless","--disable-gpu", "--no-sandbox",'--allow-running-insecure-content','--ignore-certificate-errors']}}   
        async with get_session(Chromedriver(log_file=os.devnull),browser) as session:
            await session.get('https://api.ipify.org')

            source_ip = await session.get_page_source()
            results_json.update({'source_ip': source_ip})
            results_json.update({'result':'Done!'})
            await session.close()
            return results_json

    except Exception as e:
        print(f"Error {e} ")
        return f"General error {str(e)} \n {results_json}"

def handler(event, context):
    resp = asyncio.run(arsenic_simple())
    print(resp)
    return {
        'statusCode': 200,
        'body': json.dumps(resp) 
    }

The response of the get_page_source() is -> <html><head></head><body></body></html> like a inexistent internet connection. The docker has a existent proxy conection that works using other libraries for make requests, but using Arsenic I can't use proxy .

Any idea?

I'm trying to use a Docker with proxy configuration because I can't use a proxy with auth username and password with arsenic.

dimaqq commented 2 years ago

It seems you want to configure headless chromium to use a proxy. I'd start with https://blog.apify.com/how-to-make-headless-chrome-and-puppeteer-use-a-proxy-server-with-authentication-249a21a79212/ or something

DiMiTriFrog commented 2 years ago

It seems you want to configure headless chromium to use a proxy. I'd start with https://blog.apify.com/how-to-make-headless-chrome-and-puppeteer-use-a-proxy-server-with-authentication-249a21a79212/ or something

I'm investigate how I could implement it. I'm trying to the option of selenium-wire, I mean that implement a local proxy with mitmproxy with upstream and conenct arsenic to the local result proxy.

dimaqq commented 2 years ago

My 2c: this issue can be closed. Rationale: it's a corner case in chromium configuration; this library can't cover all of browser config, only pertinent/common flags.

DiMiTriFrog commented 2 years ago

Y fix the problem using own extension for http/https proxy with auth. First need to add this flag ->"--load-extension=/path/folder_extension"

And inside folder_extension I have two files:

chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

function callbackFn(details) { return { authCredentials: { username: "USER", password: "PASSWORD" } }; }

chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: [""]}, ['blocking'] );`

bcastane commented 2 years ago

HI @DiMiTriFrog , where did you add the load-extension flag? I have been trying here, but with no luck.

browser.capabilities = { "goog:chromeOptions": {"args": ["--headless","--load-extension=C:/Path/folder/extension/"]} }

Thanks!

DiMiTriFrog commented 2 years ago

HI @DiMiTriFrog , where did you add the load-extension flag? I have been trying here, but with no luck.

browser.capabilities = { "goog:chromeOptions": {"args": ["--headless","--load-extension=C:/Path/folder/extension/"]} }

Thanks!

Hi, extensions aren't working with headless mode..