Open RajatThukral-Draup opened 9 months ago
Hi, There are 2 features:
Can you details the usecases ? Because most users need only IPv4.
Most IPv6 proxy providers won't give you a direct IPv6 endpoint, what they will do is use some sort of 6to4 method, you will connect to the proxy via IPv4 and get an outgoing IPv6 address from the provider.
So I don't see a need to add full IPv6 support yet, but I guess it can be done in the future as Cloud Services are moving away from free IPv4 addresses.
Hi, There are 2 features:
- allowing IPv6
- allow many outbound IP addresses on VM for cloud providers (which can support that).
Can you details the usecases ? Because most users need only IPv4.
To address the growing costs associated with public IPv4 IP addresses, as cloud providers have started charging for them, we are in the process of transitioning to IPv6 for our AWS cloud infrastructure. During local testing, we deactivated outbound IPv4 traffic on ports 80 and 443 for our Scrapoxy instances. However, we've encountered an issue where requests are halting during the TLS handshake phase. This problem arises despite configuring our proxy agent and master to utilize IPv6 addresses as hostnames. Below are the logs from a test run:
curl -v --proxy http://localhost:8888 --proxy-user **user***:******* https://www.google.com
* Trying [::1]:8888...
* Connected to localhost (::1) port 8888
* CONNECT tunnel: HTTP/1.1 negotiated
* allocate connect buffer
* Proxy auth using Basic with user '******'
* Establish HTTP proxy tunnel to www.google.com:443
> CONNECT www.google.com:443 HTTP/1.1
> Host: www.google.com:443
> Proxy-Authorization: Basic ****
> User-Agent: curl/8.4.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
<
* CONNECT phase completed
* CONNECT tunnel established, response 200
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/ssl/cert.pem
* CApath: none
Curl command output showing attempt to connect through proxy to www.google.com, indicating a successful connection to the proxy but stalling at the TLS handshake phase. Re-enabling outbound IPv4 traffic on the aforementioned ports resolves the issue, allowing requests to complete successfully.
We would appreciate any insights or suggestions on troubleshooting this issue further.
Hi, Can you detail your environment @RajatThukral-Draup ?
Hi, Can you detail your environment @RajatThukral-Draup ?
- Scrapoxy version: X.X.X
- Is it a custom version? Yes/No
- Which deployment method do you use? (Docker/Docker Compose/ Kubernetes/NPM/Other)
- Which is the OS for Scrapoxy ? (I assume Linux)
- Which kind of storage do you use? (file?)
Please find the details below -
Here are the modifications we've implemented:
Following these adjustments, we've successfully enabled the use of IPV6 addresses for Scrapoxy workers during requests. Although IPV6 was verified to be used within the proxyOptions, requests directed to google.com defaulted to using the public IPV4 address of Google, despite the presence of IPV6 addresses for both the requesting and target entities.
Upon manually accessing one of our Scrapoxy workers, we observed that the traffic from the worker to the target persisted in routing through the IPV4 interface, even though the request from the master to the worker was confirmed to travel over the IPV6 interface. This observation was made after analyzing packet exchanges via TCP dump.
@fabienvauchelles We greatly appreciate your assistance with this issue, as we're struggling to find a solution. Addressing this is a top priority for us, especially since the cost of AWS IPV4 IPs is significantly impacting our budget.
Hi @RajatThukral-Draup , Thanks for your answer.
As I understand that you made a lot of custom code on 3.1.1, I need to understand for a proper integration on the V4.
Can you share the custom code with me? (can be a private repository).
Thanks.
Hi @fabienvauchelles
Certainly! I've shared the link to our custom Scrapoxy code repository below.
An invitation link has also been sent to you. Here's the link: https://github.com/Draup/scrapoxy.
Incorporating this feature into Scrapoxy version 3.1.1 would be greatly beneficial for us, as upgrading to Scrapoxy version 4 is expected to require a considerable amount of time for us.
Hey @fabienvauchelles
Just wanted to check in and see if you got a chance to peek at the Scrapoxy code repo link I sent over. Here it is again just in case: https://github.com/Draup/scrapoxy.
We're really keen on getting that feature rolled into Scrapoxy v3.1.1 since jumping to v4 is a bit of a stretch for us right now.
Let me know your thoughts, or if there's anything you're wondering about it. Hope to catch up soon!
Thanks
Hi @RajatThukral-Draup ,
I fetch the repository, thanks. I need to explore it now.
Let me sometime to explore it and understand how it can smartly integrated. I will have some further questions for you to correctly write the requirements.
Hey @fabienvauchelles
Awesome, glad to hear you've got the repo! Take all the time you need to dive into it. I'm here to help answer any questions or clarify anything that might help you in understanding how we can best integrate this feature.
Just hit me up whenever you're ready or need some info. Looking forward to your insights and the questions you'll have!
Thanks
Hi @RajatThukral-Draup ,
I've reviewed the code, and it is an excellent work! I truly appreciate the enhancements you've made to version 3, particularly regarding spot instances, Prometheus integration, and the introduction of new metrics.
I have some initial inquiries:
proxy.js
file (located at tools/install/proxy.js
)? If so, would you mind sharing the code with me?Hi @RajatThukral-Draup ,
I've reviewed the code, and it is an excellent work! I truly appreciate the enhancements you've made to version 3, particularly regarding spot instances, Prometheus integration, and the introduction of new metrics.
I have some initial inquiries:
- How do you go about building the image and ensuring AWS employs IPv6? (I couldn't find any references to IPv6 during instance creation)
- Have you made any updates to the
proxy.js
file (located attools/install/proxy.js
)? If so, would you mind sharing the code with me?- Can you confirm whether you utilize a subnet to prevent IPs from being publicly accessible on the internet?
- How many instances/proxies do you use on AWS? By region?
- What's the purpose behind the "multi-region" settings?
- Additionally, I'm interested in integrating the newly added metrics. Could you highlight which ones you consider most important and provide insight into how you utilize them?
Hi @fabienvauchelles
How do you go about building the image and ensuring AWS employs IPv6? (I couldn't find any references to IPv6 during instance creation)
- Yes, we've configured our instances to automatically assign an IPv6 address upon creation.
Have you made any updates to the proxy.js file (located at tools/install/proxy.js)? If so, would you mind sharing the code with me?
- Actually, I haven't made any alterations to the tools package. The original code is accessible at: https://github.com/Draup/scrapoxy/blob/main/tools/install/proxy.js
Can you confirm whether you utilize a subnet to prevent IPs from being publicly accessible on the internet?
- We haven't implemented subnet-based restrictions. However, we've limited access to port 3128 exclusively to scrapoxy workers from the scrapoxy master.
How many instances/proxies do you use on AWS? By region?
- Currently, we manage around 150-200 instances across four distinct AWS regions.
What's the purpose behind the "multi-region" settings?
- The strategy aims to enhance request success rates and circumvent regional limitations, creating a diversified proxy pool.
Additionally, I'm interested in integrating the newly added metrics. Could you highlight which ones you consider most important and provide insight into how you utilize them?
- Implementing key metrics has significantly enhanced our system's transparency.
- Key metrics: Throughput - monitoring the rate of requests per minute. Latency - measuring the overall request delay. Current IP count - tracking the number of IPs available at any moment.
Thanks
Hi @fabienvauchelles
I would appreciate hearing from you on this matter.
We've upgraded our code to support IPv6, yet our requests continue to default to IPv4. Is there a preference for IPv4 over IPv6 in the Node.js library? Any advice or insights you could provide on this issue would be very helpful.
Additionally, when executing a curl request to google.com from the same system, it appears to utilize an IPv6 address. I'm uncertain about the precise cause of this behavior and how to resolve it.
Our Node version: v18.18.0
Few references on this: https://stackoverflow.com/questions/76844182/node-js-prefers-ipv4-over-ipv6
Thanks
Hi, If you have an IPv6 network interface on the VPC, you can force nodejs to use this specific interface:
On proxy.js
(Scrapoxy V3 => https://github.com/fabienvauchelles/scrapoxy/blob/scrapoxy3/tools/install/proxy.js#L28C22-L28C29), it is possible to force Node.js to use a specific network interface.
Add the localAddress
on the connect
method to specify the IPv6 address of the network interface (check documentation here)
To get the list of network interface, you can use this function require('os').networkInterfaces()
and filter on IPv6.
Can you keep me updated if this upgrade works ?
Hi @RajatThukral-Draup,
Any luck with your transition to v6? Do you have your version available anywhere? Can't access the https://github.com/Draup/scrapoxy anymore.
Best regards
Hello,
Thank you for the fantastic work on this project.
Given that major cloud providers are now charging substantially for public IPv4 addresses, it would be highly beneficial to incorporate IPv6 Pool support into this project. This would entail modifications to the provider API SDK code and the sections where requests are actually initiated.