Open carlvaneijk opened 2 years ago
Dev can't really fix this. Maybe you can use TOR for this. Or a VPN service.
Dev can't really fix this. Maybe you can use TOR for this. Or a VPN service.
Dev could provide a way to implement proxies if he wanted. That'd solve the problem, doubt most of yall know how to use those though.
I wrote a very similar jupyter notebook and I use Mullvad VPN for changing IPs before each new application. It comes with a command-line tool, so within the notebook I just need to use !
to execute bash commands. Here's the simple function:
def change_vpn_location():
!mullvad disconnect
!mullvad connect
Ideally it would use IPs local to the postings, but Mullvad doesn't provide IPs for the four states, so I just opted for a random US IP.
We could use an array of proxies and pick a random one each time.
If there's a way to configure socks5 you could use it with https://github.com/kpcyrd/laundry5
If we used proxies would we just use ones in the states the jobs are advertised in or would it not matter?
Also, from my experience, public proxy servers are typically very slow so it would reduce the rate we can send forms dramatically.
@kpcyrd if we did, we'd have to either find a python version of that or write it ourselves since that's in rust
the program acts as a proxy server and binds to a local port (eg. 127.0.0.1:1337
), you'd then need to configure headless chrome to use 127.0.0.1:1337
as a socks5 proxy.
@kpcyrd oh sorry, it would still be a pain to also have to install rustc for this though.
Python's Selenium implementation supports the use of proxies natively, it's finding good proxies that we can use that will be a challenge. Tutorial: https://www.tutorialspoint.com/running-selenium-webdriver-with-a-proxy-in-python
@pws1453 if we were only using proxy servers in the states the jobs were advertised it would be pretty much impossible to get a list of good proxies.
@pws1453 if we were only using proxy servers in the states the jobs were advertised it would be pretty much impossible to get a list of good proxies.
The issue isn't the proxy location, or even if it's a good proxy. The only issue is using the same IP repeatedly to do this. If there's multiple submissions from the same IP then they can figure that out. The goal is just to use a different IP everytime, so a proxy from anywhere would work fine. If they actually took the time to check where the ip address is from then yeah like you said we would have a problem lol.
My suggestion would be to add a new file, constants/proxies.py
, add an array with a list of proxy servers to rotate:
PROXY_SERVERS = ['127.0.0.1:8080',
'1.2.3.4:6969',
…
'4.3.2.1:1234']
Then in main.py
in start_driver
:
We could add
options.add_argument(f'--proxy-server={random.choice(PROXY_SERVERS)}')
.
This isn't an issue per se, but if there are multiple submissions from the same ip address, it's likely they can just filter for non-unique on their database and drop those submissions.
Good work so far!