Closed DiMiTriFrog closed 2 years ago
🤷🏿 🤔 would that be proxy between arsenic
and browser, or browser and the website?
The truth is that I don't really know what would be needed, but the end result would be that using arsenic you can access any web site through a proxy connection, like the option of : "--proxy-server:" but allowing to use user and password.
At this moment I have a Docker with the arsenic script. Now, i configured the Docker for using the proxy but arsenic isn't load any web. I made a test with python requests and the requests library is using the proxy, I don't know why the arsenic library is not loading anything.
Arsenic returns a black page -> <html><head></head><body></body></html>
I think this issue needs a MRE https://stackoverflow.com/help/minimal-reproducible-example
Well, I have a Dockerfile with proxy conf:
FROM public.ecr.aws/lambda/python:3.8
WORKDIR /app
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY headless-chromium /opt/headless-chromium
RUN chmod 777 /opt/headless-chromium
ENV http_proxy http://urltoproxy:port
ENV https_proxy https://urltoproxy:port
CMD ["/app/app.handler"]
My docker is for lambda use, I have two sample code (using the same dockerfile), one of this works and the other doesn't works. The sample code that works is a simple request of python:
Code that works (requests)
import requests
ip = requests.get('https://api.ipify.org').text
print(ip) // Returns a ip of proxy.
Code that doesn't works (arsenic)
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from arsenic import get_session, stop_session
from arsenic.browsers import Chrome
from arsenic.services import Chromedriver
import asyncio
async def arsenic_simple():
results_json = {}
try:
browser = Chrome()
browser.capabilities = {"goog:chromeOptions": {"binary":"/opt/headless-chromium","args": ["--headless","--disable-gpu", "--no-sandbox",'--allow-running-insecure-content','--ignore-certificate-errors']}}
async with get_session(Chromedriver(log_file=os.devnull),browser) as session:
await session.get('https://api.ipify.org')
source_ip = await session.get_page_source()
results_json.update({'source_ip': source_ip})
results_json.update({'result':'Done!'})
await session.close()
return results_json
except Exception as e:
print(f"Error {e} ")
return f"General error {str(e)} \n {results_json}"
def handler(event, context):
resp = asyncio.run(arsenic_simple())
print(resp)
return {
'statusCode': 200,
'body': json.dumps(resp)
}
The response of the get_page_source() is -> <html><head></head><body></body></html>
like a inexistent internet connection. The docker has a existent proxy conection that works using other libraries for make requests, but using Arsenic I can't use proxy .
Any idea?
I'm trying to use a Docker with proxy configuration because I can't use a proxy with auth username and password with arsenic.
It seems you want to configure headless chromium to use a proxy. I'd start with https://blog.apify.com/how-to-make-headless-chrome-and-puppeteer-use-a-proxy-server-with-authentication-249a21a79212/ or something
It seems you want to configure headless chromium to use a proxy. I'd start with https://blog.apify.com/how-to-make-headless-chrome-and-puppeteer-use-a-proxy-server-with-authentication-249a21a79212/ or something
I'm investigate how I could implement it. I'm trying to the option of selenium-wire, I mean that implement a local proxy with mitmproxy with upstream and conenct arsenic to the local result proxy.
My 2c: this issue can be closed. Rationale: it's a corner case in chromium configuration; this library can't cover all of browser config, only pertinent/common flags.
Y fix the problem using own extension for http/https proxy with auth. First need to add this flag ->"--load-extension=/path/folder_extension"
And inside folder_extension I have two files:
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) { return { authCredentials: { username: "USER", password: "PASSWORD" } }; }
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["
{ "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] }, "minimum_chrome_version":"22.0.0" }
HI @DiMiTriFrog , where did you add the load-extension flag? I have been trying here, but with no luck.
browser.capabilities = { "goog:chromeOptions": {"args": ["--headless","--load-extension=C:/Path/folder/extension/"]} }
Thanks!
HI @DiMiTriFrog , where did you add the load-extension flag? I have been trying here, but with no luck.
browser.capabilities = { "goog:chromeOptions": {"args": ["--headless","--load-extension=C:/Path/folder/extension/"]} }
Thanks!
Hi, extensions aren't working with headless mode..
Now I'm using the proxy auth IP whitelist, but some ip providers only let use with username and password.
How could I use arsenic with proxy using username and password? Selenium have selenium-wire for this case.