AlphaReign / scraper

AlphaReigns DHT Scraper, includes peer updater and categorizer
MIT License
127 stars 35 forks source link

Catagories #31

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hello. im still using your old scraper and and php front end. could you help me figure out why i cant sort by albums.. i checked the scraper and seems the code is there to push it to albums if it has more than 3 tracks.

ghost commented 6 years ago

I just cant find any torrents in elastic that has been pushed to albums and theres 14 million torrents

ghost commented 6 years ago

any help much appreciated

Raxvis commented 6 years ago

Try adding some console log's int he album category check and see if that is ever getting hit

ghost commented 6 years ago

Nope not getting hit :(

Raxvis commented 6 years ago

There is probably something wrong with the if check then.

ghost commented 6 years ago

The if check is were i tried console log. Is this were i should of been trying it

ghost commented 6 years ago

getAudioCategories (file, torrent) { if (torrent.count > 3) { torrent.categories.push('album'); console.log('yes123'); }

Raxvis commented 6 years ago

that's the correct spot.

Also, add this after that first if block

    if (torrent.length && torrent.length > 3) {
        torrent.categories.push('album');
        console.log('yes456');
    }

And see if that picks up anything

ghost commented 6 years ago

Thanks will give it a go. Shall i add that to src and dist ? Also just a note you have .ra as a format in audio categories so your basically adding all .rar files to audio categories.

Raxvis commented 6 years ago

oh, pull the .ra out for sure.

and yes, add to dist and src

ghost commented 6 years ago

Yeah i pulled it now after 14 million torrents haha. If only theres a way to remove the audio categorie and start again lol.

Anyway i added the code after the first block and still nothing:( im seeing torrents with more than 3 mp3 files but just not getting pushed to albums

Kind regards

Raxvis commented 6 years ago

Can you upload the whole categories.js file so I can give it a once over again

ghost commented 6 years ago

src.zip

ghost commented 6 years ago

Thanks for your time i appreciate it

Raxvis commented 6 years ago

Try and update getAudioCategories to this

    getAudioCategories (file, torrent) {
        console.log('getAudioCategories');

        if (torrent.count > 3) {
            torrent.categories.push('album');
            console.log('yes123');
        }

        if (torrent.files) {
            const audioFiles = Object.keys(torrent.files).filter((key) => {
                const ext = `.${torrent.files[key].path.split('.').pop()}`;

                return this.audioFormats.indexOf(ext) > -1
            });

            if (audioFiles.length > 3) {
                torrent.categories.push('album');
                console.log('yes456');
            }
        }

        return torrent;
    }

in both dest and src

ghost commented 6 years ago

Thanks mate will give this a go shortly . Do you know if its possible to purge the audio category from elastic search with out starting a fresh index ?

ghost commented 6 years ago

finally working :) really appreciate that

Raxvis commented 6 years ago

Awesome. One way to purge would be to query all the audio category and run them through the categorizer again

ghost commented 6 years ago

Thankd mate. Any pointers on how to do that. Send me your email ill send another donation too

ghost commented 6 years ago

One think i noticed after uodating with your fix was its showing torrents in albums that are months old too so i guess it was pushed to elastic and some point

Raxvis commented 6 years ago

So, if a torrent comes up on the network again, it will re-categorize it.

As for the current audio category, give me a bit to work on a script for you.

My prefinem@gmail.com email should be fine

Raxvis commented 6 years ago

This should update all the audio types to remove the audio type

POST <elasticsearchURL>torrents/_update_by_query

{
    "script": {
        "source": "ctx._source.type = ''",
        "lang": "painless"
    },
    "query": {
        "term": {
            "type": "audio"
        }
    }
}
ghost commented 6 years ago

Thanks will give it a go. Is that to be put in ssh or in the js file ?

Raxvis commented 6 years ago

Use something like PostMan to make an HTTP request to the elasticsearch server

ghost commented 6 years ago

thanks i tied postman and got 409 conflict and so i tried curl too and produced this.

curl -XPOST "http://x.x.x.x.x/torrents/_update_by_query"

{"took":13957,"timed_out":false,"total":14591107,"updated":22999,"deleted":0,"batches":23,"version_conflicts":1,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"torrents","type":"hash","id":"7e244d1f42e0dee246995739f9dfa2a25e0f64bc","cause":{"type":"version_conflict_engine_exception","reason":"[hash][7e244d1f42e0dee246995739f9dfa2a25e0f64bc]: version conflict, current version [12] is different than the one provided [10]","index_uuid":"rVdieoLkTWOsaRCYH-PCQg","shard":"2","index":"torrents"},"status":409}]}[root@CentOS-74-64-minimal ~]# '{
>     "script": {
>         "source": "ctx._source.type = ''",
>         "lang": "painless"
>     },
>     "query": {
>         "term": {
>             "type": "audio"
>         }
>     }
> }'
ghost commented 6 years ago

think i solved using http://x.x.x.x.x:7346/torrents/_update_by_query?conflicts=proceed

but taking forever and eating CPU :D

ghost commented 6 years ago

ok finally its removed the audio :) thanks so much for your help

Raxvis commented 6 years ago

Awesome! Glad to hear.