AlphaReign / scraper

AlphaReigns DHT Scraper, includes peer updater and categorizer
MIT License
127 stars 35 forks source link

Peer updater #33

Closed ghost closed 5 years ago

ghost commented 6 years ago

Hey mate just wondering your thoughts on Peer updater. Im using the old scraper as you know and have around 17 million torrents. i have a top 100 page sorted by seeders. however it can takes weeks or longer to get an update of seeders leechers information for a torrent.

kind regards

ghost commented 6 years ago

@Prefinem would it also make sense that the scraper.js filters out inactive torrents? so its not updating peer info for inactive torrents. not sure if its already doing this or not?

Update.. i added this to make sure its not updating inactive torrents.

must_not: [ { term: { inactive: true } } ],

ghost commented 6 years ago

@Prefinem So i think i have too many torrents that havent been updated and the scraper just cant possibly keep up. and possibly when months ago i changed to peer age "0" ... Do you have any thoughts on how to rectify this withought creating a new index?

ghost commented 6 years ago

Just checked the response from the scraper and it shows this many torrents not updated "14,850,017" :D . any ideas hahaha

Raxvis commented 6 years ago

Is it still at 14 million non updated torrents?

Raxvis commented 6 years ago

There are also some settings in config/index.js that you can change / tweak to speed things up

ghost commented 6 years ago

Ill check how many now are not updated. Could tou be a bit more specific to what settings could speed things up ?

Kind regards

Raxvis commented 6 years ago
    tracker: {
        // Minutes before we should try and update a torrent again
        age: 360,
        // Seconds between every scrape
        frequency: 1,
        host: 'udp://tracker.coppersurfer.tk:6969/announce',
        limit: 75,
    },

Either the frequency set to 0, or the limit increased from 75

ghost commented 6 years ago

Btw im talking about the old scraper :D

Raxvis commented 6 years ago

Oh, the old scraper isn't keeping up? That makes more sense. It wasn't designed very well TBH

ghost commented 6 years ago

Nope its not. I guess ill just have to wait till you fix the new scraper and create a backfill script :)

Raxvis commented 6 years ago

Working on it soon. I am pulling down my current DB and comparing it to your elasticsearch export to match them right now

ghost commented 6 years ago

Thanks mate looking forward to it :) btw whats the limit you can scrape from coppersurf ?

Raxvis commented 6 years ago

I honestly don’t know. I thought 75 was the limit but it might be more

On Mon, Oct 22, 2018 at 13:09 ash121121 notifications@github.com wrote:

Thanks mate looking forward to it :) btw whats the limit you can scrape from coppersurf ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlphaReign/scraper/issues/33#issuecomment-431919502, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnvZgBJfqbAyZnivu94Nu3rQFiMqDUMks5ungnrgaJpZM4XZVKW .

ghost commented 6 years ago

no worries thanks anyway. @Prefinem Could you help me out with one thing with the old scraper while im waiting for the new scraper , if you have time.

Ive noticed that when a torrent enters the DB and has 0 leechers and 0 seeders , then it gives it a peers_updated value of "0" But if the torrent enters with say 1 seeder then the peers_updated value is set correctly.

Maybe this could be the cause why the old scraper isnt keeping up? if torrents with no seeder/leecher enter with peers_updated=0 then the scraper will be trying to update all those torrents with peers_updated = 0 first. what do you think?