Closed ghost closed 5 years ago
That should be correct, with the exception of categories. I think I merged categories into tags. So now you need to only filter by tags now
thanks i tried to follow your install instructions but have failed. "yarn global install pm2" this doesnt seem to work but this does. "yarn global add pm2"
and then i got to "yarn migrate" and got this error
yarn migrate yarn run v1.10.1 $ ./node_modules/.bin/knex migrate:latest /bin/sh: ./node_modules/.bin/knex: No such file or directory error Command failed with exit code 127. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
I updated the docs for the yarn global add pm2
Did you run just yarn
inside the folder first?
I ran yarn migrate inside scraper folder
You will need to run yarn
before yarn migrate
Ill give it a go. Ive never used yarn before and i didnt see in the docs you had to run yarn first
Ah! Good catch. Sorry about that. It's something I do a lot so I didn't event think about it. I added it to the docs
No worries :) so what do you recommend for my site using your scraper? Currently i have 17 million torrents and i use elasticdump to make dumps every now and then. Should i use sqlite or mysql ?
Kind regards
You definitely want to do mysql. SQLite will be slower than MySQL. Plus there are lots of tools for backing up MySQL
Thanks going to try it now. Would there be any performance decrease using this new version over the old one ? Using mysql and elasticsearch wouldnt there be more overhead ?
There will be more performance overhead because MySQL will be running, but, we don't update elasticsearch as much which means elasticsearch will run better
Just a heads up, I just made a change to ensure that torrents get updated when they get scraped from the tracker. You will want to pull down the latest version
So the scraper feeds the torrents to mysql and then mysql to elastic search ?
so it looks like im stuck on yarn migrate. here is the error.
yarn migrate yarn run v1.10.1 $ ./node_modules/.bin/knex migrate:latest /root/scraper/migrations/20180816161002_init.js:1 (function (exports, require, module, filename, dirname) { exports.up = async (knex) => { ^
SyntaxError: Unexpected token ( at createScript (vm.js:56:10) at Object.runInThisContext (vm.js:97:10) at Module._compile (module.js:549:28) at Object.Module._extensions..js (module.js:586:10) at Module.load (module.js:494:32) at tryModuleLoad (module.js:453:12) at Function.Module._load (module.js:445:3) at Module.require (module.js:504:17) at require (internal/module.js:20:19) at /root/scraper/node_modules/knex/lib/migrate/index.js:92:25 at arrayFilter (/root/scraper/node_modules/lodash/lodash.js:582:11) at filter (/root/scraper/node_modules/lodash/lodash.js:9173:14) at /root/scraper/node_modules/knex/lib/migrate/index.js:91:108 at tryCatcher (/root/scraper/node_modules/bluebird/js/release/util.js:16:23) at Promise._settlePromiseFromHandler (/root/scraper/node_modules/bluebird/js/release/promise.js:509:35) at Promise._settlePromise (/root/scraper/node_modules/bluebird/js/release/promise.js:569:18) error Command failed with exit code 1. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
also just to make you aware your docs state to update config.json but this file doesnt exist. i think you meant config/index.js ?
So the scraper feeds the torrents to mysql and then mysql to elastic search ?
Yes
so it looks like im stuck on yarn migrate. here is the error.
I will look into it
also just to make you aware your docs state to update config.json but this file doesnt exist. i think you meant config/index.js ?
Fixing now
Thanks mate look forward to a fix
Hey mate any update ?
@ash121121 can you give me the output of running node -v
in your terminal?
I destroyed the server so will set it up again today and send it to you
Okay, thanks.
Just for your information i used yum install nodejs on centos 7.4
Okay, good to know.
so i did a fresh install and same error. i ran that command and the version is below.
[root@scw-966035 scraper]# node -v v6.14.3
Ah... I have been testing on node 8. I think that might be an issue. Can you upgrade node and give it a test?
Upgraded and got past yarn migrate. Ill let it run for a little while and let you know. Could you update readme to specify version 8 please ?
Thanks
so does this mean its all good?
scraper > Total Torrents: 32392 x x[ 0] scraper xx scraper > Torrents without Tracker: 11 x x xx scraper > Torrents not in Search: 496 x x xx scraper > Total Torrents: 32485 x
I will update the README.md.
And yes, that is good. you have 32,392 torrents total. Only 11 are without seeders / leechers information and 496 are not in elasticsearch
Great thanks for your help :) so the torrents withought seeder info. Will these get updated? And the torrents not in elasticsearch will these be pushed to elasticsearch ?
Thanks
Yes. Those are just counts since they are run on different processes. This way you can keep track of stuff that hasn't been scraped or pushed to search.
Just more of a notice then anything. If you see the numbers for either torrents without tracker or torrents not in search keep increasing over a long period of time, just let me know. They should be able to keep up, but some configuration might have to be tweaked
Thanks for the info :) just a quick question. Im planning to replace the old scraper with the new release using your php front end. I know you mentioned that categories are now tags ? Do you think the new scraper can still update peer info for the torrents in my db created by the old scraper.
If theres anything else i should know before i do the switch please let me know.
Kind regards
I have attached an image showing the side by side difference in the mappings for a hashID. I think it needs a little work to be compatible with the php front end. I notice time stamps are no longer in UNIX. Peers_Updated has changed to TrackerUpdated . and also on the front end it shows the file sizes but not the files themselfs.
kind regards
The time stamps i can convert to unix in php but could you assist on how to manage my current database with the new scraper as peers_updated has changed to TrackerUpdated. In guessing we could add an OR statement within the if statment in js ?
Kind regards
Thanks for this diff. I didn't have one before and thought I had everything matched up.
I will fix the path issues (strings instead of arrays)
What is the peers_updated being used for?
thanks mate. the peers_updated was the old scraper to check when the seeder leecher info was last updated. it was in unix format but looks like in the new scraper its now called trackerupdated
that is correct. Is it being used in the php code anywhere?
no its not but as im trying to replace the old scraper with the new i wasnt sure how i would update seeder/peer info for all 17 million torrents that have the peer_updated term :D
the new scraper seems great with if your starting a new index. i was just thinking of people that have already built an index with your old scraper, and how when using your new scraper they could still get the torrents updated.
Gotcha. I will put in a ticket to write an elasticsearch to db importer so people can update their database with all their current torrents
Thanks mate. Could you also include votes and flags in the importer as votes are stored in elasticsearch also. Is there anything you need me to test? kind regards
Will do!
And I think that is all right now.
Actually, is there a way I could get a backup of your elasticsearch cluster?
yes ill make a dump now and Gzip it up for you
Are you familiar with exporting and importing with elastic search? below are the commands if not. npm install elasticdump -g
elasticdump \
--input=http://127.0.0.1:9200/torrents \
--output=/home/torrents.json \
--type=data
elasticdump \
--input=/home/torrents.json \
--output=http://127.0.0.1:9200/torrents \
--type=data
the first is for export the 2nd for import
Do you have an email were i could post the dump ?
You can send it to prefinem@gmail.com
All sent mate. Its 4gb packed and 20gb unpacked. Let me know when you downloaded it so i can remove the link.
Downloading now. I will let you know when it's done. Thanks!
If you want to download it via terminal on a server let me know and ill remove cloudflare protection for that.
I am just downloading it to my development laptop so no worries.
hey @Prefinem how we looking at the progress :)
Hey just noticed you made the release live. So am i right in thinking i can just migrate from the old scraper to the new one with your php front end ? And use my existing elastic db ? Kind regards