AlphaReign / scraper

AlphaReigns DHT Scraper, includes peer updater and categorizer
MIT License
127 stars 35 forks source link

Scraping speed #42

Closed ghost closed 5 years ago

ghost commented 5 years ago

Just after some advice. I don't seem to be collecting as many torrents as I did with the old scraper. its been 2-3 days and I've only collected under 1 million. The old scraper it was possible to reach 1 million in 24hrs.

Does scraping speed depend on location or host etc? I'm currently testing in Amsterdam on Digital Ocean.

would there be a better location/host?

kind regards.

Raxvis commented 5 years ago

So if you want to compare the two. Make sure they are in the same region. Different regions will have more or less peers which will slow things down.

On Thu, Jan 31, 2019 at 7:00 AM ash121121 notifications@github.com wrote:

Just after some advice. I don't seem to be collecting as many torrents as I did with the old scraper. its been 2-3 days and I've only collected under 1 million. The old scraper it was possible to reach 1 million in 24hrs.

Does scraping speed depend on location or host etc? I'm currently testing in Amsterdam on Digital Ocean.

would there be a better location/host?

kind regards.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AlphaReign/scraper/issues/42, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnvZmsxlkUdHDhBi2C6zar40ziSm-kYks5vIujtgaJpZM4acQu6 .

milezzz commented 5 years ago

Huge thanks to @ash121121 for helping me get my instance working 100% today after some hiccups. Awesome code @Prefinem !

uptime │ 14m | 1,880 TORRENTS

...Just chiming in to say it is quite a bit slower than I remember it from the last time I tested. Traffic graph: https://i.gyazo.com/d8ffbdf16f7117637e46dcfe4d258c64.png

Raxvis commented 5 years ago

Give it some time to spin up. When I tested this versus the new code, it was slightly slower, but the code is maintainable.

ghost commented 5 years ago

Alot slower for me. Code is much better as you say :) There anyway you can put some more horsepower into the scraper? :D

Raxvis commented 5 years ago

Not sure. Maybe. Need to look at why it’s slower.

On Thu, Jan 31, 2019 at 7:32 PM ash121121 notifications@github.com wrote:

Alot slower for me. Code is much better as you say :) There anyway you an put some more horsepower into the scraper? :D

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlphaReign/scraper/issues/42#issuecomment-459572698, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnvZpVm3jJnvABo365MnKsd1NDwCNtEks5vI5kUgaJpZM4acQu6 .

ghost commented 5 years ago

One thing I don't think any of us have taken In to account is the previous scraper, the torrents we wanted to filter out were just marked as inactive but we're still counted in the database. We're as this scraper the nasty torrents don't even enter the database to be counted. Maybe it looked so fast because of this.

Raxvis commented 5 years ago

That is a good point. I didn't think about that. We could always drop all the filters and check the speed

ghost commented 5 years ago

so I span up 3 servers at Digital ocean. 2 in Germany and one in NYC all pointing to the same Mysql and elastic database and I seem to be getting around 1k torrents per 4 Minutes

ghost commented 5 years ago

I'll test without filters

Raxvis commented 5 years ago

That is about 350K a day.

Seeing as there are only 30 million torrent's generally active on public trackers, that would take 3 months to max out

ghost commented 5 years ago

tested with no filters and same result if not slower :/

Raxvis commented 5 years ago

Remember to let it run for a bit. It has to get a lot of peers before torrents start really coming in

On Fri, Feb 1, 2019 at 5:19 PM ash121121 notifications@github.com wrote:

tested with no filters and same result if not slower :/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlphaReign/scraper/issues/42#issuecomment-459901526, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnvZo6pDDO_en0hGd5TRsMb4cRA9nGbks5vJMuPgaJpZM4acQu6 .

ghost commented 5 years ago

will do :) i found adding these to the config boosted speed by at least 2 fold.

{ address: 'router.utorrent.com', port: 6881 }, { address: 'router.bitcomet.net', port: 554 }, { address: 'dht.aelitis.com', port: 6881 },

milezzz commented 5 years ago

adding these to the config boosted speed by at least 2 fold

Yes, definitely a bit faster here thanks @ash121121. Editing formats/tags really slows it down I think. I added 6x new formats and a couple tags and I got around 20k over the last hour w/ netin reaching around 3.5mbit - this is Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz - 8GB VM w/ 4 cpu core + ssd.