godfriedmeesters / scraper

As part of DiffScraper, one or more bots can be deployed. Ready-to-use bots are provided that can extract offers from mobile applications, mobile websites and desktop websites.
GNU General Public License v3.0
2 stars 0 forks source link

Gain residential IP connectivity #16

Open bkrumnow opened 3 years ago

bkrumnow commented 3 years ago

Status:

bkrumnow commented 3 years ago

I setup the mobile phone VPN today, which works fine. I could do the same for the windows machine, but I think I would love to use a different residential ip for this one. I cannot do it before Friday (the IP is in used for a different study)

bkrumnow commented 3 years ago

I stumbled upon this one here. Seems that we can simply forward the traffic to our local networks without any additional clients: https://linuxize.com/post/how-to-setup-ssh-socks-tunnel-for-private-browsing/ I am going to test this tomorrow for the windoof machine

bkrumnow commented 3 years ago

The TH server can use a proxy now as well. For that fire in a terminal the following command to activate the proxy connection:

putty.exe -load "study_price_diff"

Afterwards, you can use the proxy with the following setup: localhost:33344, SOCKS5

Please make sure that our clients do not reveal their real identity, e.g. Screenshot 2021-05-06 at 12 02 34 this should never be something with 139.6..., which would be the university's IP address.

You can also use the graphical interface if needed: Screenshot 2021-05-06 at 12 02 11

bkrumnow commented 3 years ago

@godfriedmeesters Is your Desktop scraper connected to your local network via a proxy now?

Btw. have you tested if puppeteer works with proxies?

godfriedmeesters commented 3 years ago

Good news!

Should I also use my residential IP? I understand that residential IPs are better than cloud IPs, but my IP is from Belgium and your IP is from Germany, and if I remember correctly location is a factor that can influence prices?

bkrumnow commented 3 years ago

Will you pass me your public key?

godfriedmeesters commented 3 years ago

Will you pass me your public key?

already passed on skype

bkrumnow commented 3 years ago

I do not see it in skype. Feel free to just post it here.

godfriedmeesters commented 3 years ago

my proxy server is godfriedscloud.ddns.net:80 with user godfried and password can send you by skype

of course the speed is not the same as direct, +- 200kbyte/sec

bkrumnow commented 3 years ago

Starting the VPN connection on a phone via ADB is possible. However, I believe we should build this interaction through Appium.

Via adb, you can fire:

adb shell am start -n 'com.android.settings/.Settings\$VpnSettingsActivity' adb shell input tap 100 300 adb shell input tap 900 1650

And of course, it needs timeouts in between x(. I was a bit surprised that there is no built-in functionality for that. @godfriedmeesters: Would you build a script to make that interaction happening?

godfriedmeesters commented 3 years ago

Starting the VPN connection on a phone via ADB is possible. However, I believe we should build this interaction through Appium.

Via adb, you can fire:

adb shell am start -n 'com.android.settings/.Settings$VpnSettingsActivity' adb shell input tap 100 300 adb shell input tap 900 1650

And of course, it needs timeouts in between x(. I was a bit surprised that there is no built-in functionality for that. @godfriedmeesters: Would you build a script to make that interaction happening?

Ok looks like a good solution and it seems this can be integrated in Appium:https://appiumpro.com/editions/3-running-arbitrary-adb-commands-via-appium

bkrumnow commented 3 years ago

Proxy for the TH Köln server is ready now: http://paddyscloud.ddns.net:43984 I still would like to have authentication for both, but I haven't got that running.

@godfriedmeesters Could you run tests on all involved machines and close the ticket if it works? (Including automatic connection to proxies and vpns)

godfriedmeesters commented 3 years ago

For websites, you can now specify a "proxy" parameter that works only for website scrapers.

I noted that via my proxy server is much slower than direct connection.

godfriedmeesters commented 3 years ago

To open vpn on phone, you can use the following script, however I did not see any vpns in your phone: const wdio = require('webdriverio');

// javascript const opts = { path: '/wd/hub', port: 4723, capabilities: { deviceName: "emulator-5554", platformName: "Android", appActivity: 'com.android.settings.Settings$VpnSettingsActivity', appPackage: 'com.android.settings', automationName: "UiAutomator2", } };

async function main () { const client = await wdio.remote(opts);

// await client.deleteSession(); }

main();

godfriedmeesters commented 3 years ago

So for websites we have http://paddyscloud.ddns.net:43984 and godfriedscloud.ddns.net:80 You can specify them as follows:

{ "scrapers": [ { "params": { "proxy": "godfriedscloud.ddns.net:80" }, "scraperClass": "ExpediaWebScraper" } ], "inputData": { "origin": "BRU", "destination": "AMS", "departureDate": "2021-07-01" } }

godfriedmeesters commented 3 years ago

If you want for the AppScrapers we can specify something like "VPN": "on" or "off".

godfriedmeesters commented 3 years ago

seems to work well my web proxy (Expedia Web runs ok ).

godfriedmeesters commented 3 years ago

For the web scrapers on my Kubernetes cluster, am now only using our residential Ips.

Still scraping at this moment. Should be possible to know which IP address was used for every scraped offer, by doing a join of scraperRunResult with the params field of scraperRun

bkrumnow commented 3 years ago

Do not forget to give these providers a break. It would be good to stop all queues while you are still developing

godfriedmeesters commented 3 years ago

i gave them a break of like one day. now testing with residential ips only, these proxy servers will slow down scraping, but hopefully not to the point that we get errors.

godfriedmeesters commented 3 years ago

paddyscloud.ddns.net:43984 does not work: Error: net::ERR_INVALID_AUTH_CREDENTIALS

godfriedmeesters commented 3 years ago

[error] webscraper-deployment-5477f8899c-lvvn4: {"id":"3144","name":"default","data":{"comparisonRunId":5513,"comparisonSize":2,"comparisonId":74,"params":{"language":"fr","proxy":"paddyscloud.ddns.net:43984"},"scraperClass":"OpodoWebScraper","inputData":{"origin":"FRA","destination":"CDG","departureDate":"2021-07-01"}},"opts":{"attempts":1,"delay":0,"timestamp":1621452150048},"progress":0,"delay":0,"timestamp":1621452150048,"attemptsMade":0,"stacktrace":[],"returnvalue":null,"finishedOn":null,"processedOn":1621452150050}: Error when scraping OpodoWebScraper on webscraper-deployment-5477f8899c-lvvn4: Error: net::ERR_INVALID_AUTH_CREDENTIALS at

bkrumnow commented 3 years ago

could you try again?

godfriedmeesters commented 3 years ago

removed my belgian residential proxy after several connection timeouts