We are using Node.js 20.9.0 and proxy-chain 2.3.0 within crawlee 3.11.2.
Our proxy uses basic authentication, and we pass the proxy URL along with credentials to PlaywrightCrawler from crawlee.
While proxy-chain works without any issues on my local machine, it fails in our staging and production environments. All environments use Linux x64 containers.
To troubleshoot, I patched proxy-chain on the container to enable verbose mode and added extra log lines to ensure that the correct credentials were being sent.
Here’s a sample log output:
INFO PlaywrightCrawler: Starting the crawler.
ProxyServer[32941]: Listening...
ProxyServer[32941]: !!! Handling CONNECT example.com:443 HTTP/1.1
ProxyServer[32941]: Using upstream proxy http://<redacted>:<redacted>@51.X.X.X:8888/
ProxyServer[32941]: Using chain() => example.com:443
ProxyServer[32941]: Failed to authenticate upstream proxy: 407 host,example.com:443,proxy-authorization,Basic Ym4..<redacted>
ERROR PlaywrightCrawler: Request failed and reached maximum retries. Error: Detected a session error, rotating session...
page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at https://example.com/
I confirmed that the correct credentials were sent. We used tcpdump for packet inspection on the proxy machine, and here’s what we found:
As you can see, the HTTP headers are not well formatted.
I reviewed the arguments for Node.js's http.request function, and it specifies that headers should be sent as a dictionary, but in this case, they were being sent as an array. I am unsure why proxy-chain behaves inconsistently across environments.
To resolve the issue, I patched proxy-chain on the container. Below are the changes:
We are using Node.js 20.9.0 and proxy-chain 2.3.0 within crawlee 3.11.2.
Our proxy uses basic authentication, and we pass the proxy URL along with credentials to PlaywrightCrawler from crawlee.
While proxy-chain works without any issues on my local machine, it fails in our staging and production environments. All environments use Linux x64 containers.
To troubleshoot, I patched proxy-chain on the container to enable verbose mode and added extra log lines to ensure that the correct credentials were being sent.
Here’s a sample log output:
I confirmed that the correct credentials were sent. We used tcpdump for packet inspection on the proxy machine, and here’s what we found:
As you can see, the HTTP headers are not well formatted. I reviewed the arguments for Node.js's http.request function, and it specifies that headers should be sent as a dictionary, but in this case, they were being sent as an array. I am unsure why proxy-chain behaves inconsistently across environments.
To resolve the issue, I patched proxy-chain on the container. Below are the changes:
Original code:
Modified code:
After applying this modification, the issue was resolved. Any comments?