Closed pmwoodward3 closed 7 years ago
For Apache, you will need to redirect all urls (not including anything under /assets) to index.php
Could you help me out with what that would look like in the vhost config? Might be a good addition to the readme as well :)
Try this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
Thats actually what I tried, I sysadmin for a company that does wordpress sites so that was the first thing I tried.
Did you use nginx for your setup? If it works out of the box I can switch that that easily.
Yes, I use nginx.
I can add a basic nginx config to the repo as well, to make things easier.
Here is a basic and secure config
https://github.com/AlphaReign/www-php/blob/master/nginx.conf https://github.com/AlphaReign/www-php/blob/master/nginx.secure.conf
So I got that working (thank you), but now when I login I get "Slim Application Error
A website error has occurred. Sorry for the temporary inconvenience."
You will want to set slim to echo out the errors. Should be in the root index file.
Sorry I dont really have much experience with this, could you let me know where/what to change? Sorry for being a pain.
No worries. Comment out all the error document lines first.
Also. /settings.php change display errors to true.
Here's the whole error log for slim, funny things is that ES is up and I can connect to it. All this is on the same VM.
Slim Application Error
The application could not run because of the following error: Details Type: Elasticsearch\Common\Exceptions\BadRequest400Exception Code: 400 Message: {"error":{"root_cause":[{"type":"search_parse_exception","reason":"No mapping found for [seeders] in order to sort on"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"torrents","node":"QSJ6j2H2R8mMOhpGsLczqg","reason":{"type":"search_parse_exception","reason":"No mapping found for [seeders] in order to sort on"}}]},"status":400} File: /var/www/html/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Connections/Connection.php Line: 610 Trace
So the torrent index is missing the seeders field. Should be easy enough to get working if you run the scrape.js file
Its already running/been running though, here's the overview from Elastic HQ.
Can you get a mapping of the torrents index?
I actually just deleted both of those to try to start from scratch, let me see what happens when I start it all back up.
Is there anything that I need to do to have the indexes created properly? I just started scrape.js and it complains that it doesnt exist (because it doesnt). Whats the "right way" to create them?
Okay... So, I probably need a mapping for the torrent index.
In the mean time, start up the add.js first, let that run and get a few torrents in the system.
Next start up scrape.js and run through all the torrents.
That should create a full index. I will try and spin up something and pull the mapping out soonish.
Ok it created a torrents index after trying to start it a few times with pm2. Here's the mapping:
Great, now run scrape.js
It should add seeders to the mapping
Ok scape is up, Is there a certain number of torrents that it needs before it "triggers" that? Because I don't think seeders got added to the mapping yet to the mapping.
can you run "pm2 log scrape"
And boom goes the dynomite, looks like coppersurfer was timing out for me because of a weird firewall rule, ditched that and everything is running smoothly as far as scrape.js is concerned, seeders popped up in the mapping.
Is there anyway to use multiple trackers with scrape.js?
Not simply. Plus your peer count wouldn't be correct.
On Fri, Mar 31, 2017, 11:25 AM pmwoodward3 notifications@github.com wrote:
And boom goes the dynomite, looks like coppersurfer was timing out for me because of a weird firewall rule, ditched that and everything is running smoothly as far as scrape.js is concerned, seeders popped up in the mapping.
Is there anyway to use multiple trackers with scrape.js?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/AlphaReign/www-php/issues/2#issuecomment-290760708, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnvZqkkiulZ1cxfbGuzJJJL2pdvq0-uks5rrSj0gaJpZM4Mu0DY .
Cool thanks. What would be the best way to jumpstart the amount of torrents I have? Any way to import the Alphareign DB?
Not yet. I need to export it sometime, but haven't yet
Also, mapping added here: https://github.com/AlphaReign/scraper/blob/master/mapping.json for anyone else that runs into this issue
How would I import that mapping? Also now categorize is throwing this error:
Trace: [search_parse_exception] No mapping found for [categories_updated] in order to sort on at /opt/scraper/categorize.js:37:17 at tryCallOne (/opt/scraper/node_modules/elasticsearch/node_modules/promise/lib/core.js:37:12) at /opt/scraper/node_modules/elasticsearch/node_modules/promise/lib/core.js:123:15 at flush (/opt/scraper/node_modules/elasticsearch/node_modules/promise/node_modules/asap/raw.js:50:29) at nextTickCallbackWith0Args (node.js:489:9) at process._tickDomainCallback (node.js:459:13)
You would have to delete the torrent index and then add the mapping, then start your scripts
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
And that's to also get categoirze.js working?
Yes, I believe it should get categorize.js to work
I reran everything and imported the mapping and i'm still getting the same issues with categorize.js.
I know this might be too much of an ask, but would it be possible to get a docker image of www and scrape?
And also thank you for being so responsive and helpful, really means a ton.
I honestly probably don't have time for a docker image creation. I did add a change to add.js on the scraper repo to add the categories_updated when inserting records.
Give that a go if you are still willing to work with on it.
@pmwoodward3 Just an update... I know you (and others) have been having issues, so I burnt a few hours and made a massive update to the scraper library.
Spun up a new server, and after a fresh install of elasticsearch, running it creates and updates the index correctly along with manages everything from one index file
Release here: https://github.com/AlphaReign/scraper/releases/tag/0.0.2
There is also a config.json file now that allows you to edit settings if you need to versus editing files directly
Awesome thank you! Looks like everything is working now as far as I can tell with the new update.
I take that back, its now spitting out 400 errors when it tries to validate torrents
0|index | { [Error: [action_request_validation_exception] Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;] 0|index | status: 400, 0|index | displayName: 'BadRequest', 0|index | message: '[action_request_validation_exception] Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;', 0|index | path: '/_bulk', 0|index | query: {}, 0|index | body: '{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228192},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228192},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228192},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228192},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228192},"doc_as_upsert":true}\n', 0|index | statusCode: 400, 0|index | response: '{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;"},"status":400}', 0|index | toString: [Function], 0|index | toJSON: [Function] } 0|index | Added: 85922fbee6dce5e2f5491e16bcdd9e6e427ba5aa | slackware64-14.2-iso 0|index | Scrape Successful 0|index | { [Error: [action_request_validation_exception] Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;] 0|index | status: 400, 0|index | displayName: 'BadRequest', 0|index | message: '[action_request_validation_exception] Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;', 0|index | path: '/_bulk', 0|index | query: {}, 0|index | body: '{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228197},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228197},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228197},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228197},"doc_as_upsert":true}\n{"update":{"_index":"torrents","_retry_on_conflict":3,"_type":"hash"}}\n{"doc":{"peers_updated":1491228197},"doc_as_upsert":true}\n', 0|index | statusCode: 400, 0|index | response: '{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: id is missing;2: id is missing;3: id is missing;4: id is missing;5: id is missing;"},"status":400}', 0|index | toString: [Function], 0|index | toJSON: [Function] }
Changing the default tracker seems to have helped, maybe there is some issue or rate limiting happening?
Also looks when I start with a fresh index, it only creates 2 documents, even though i'm seeing it scrape a ton of stuff in the logs without any errors (using a different tracker).
So... yeah... rate limiting is probably an issue. I will add a timeout for scraping to the config file.
Also, yes, it is constantly scraping, which means, if you only have two documents, it constantly scrapes those documents. Do you think that it should only update if older than 'x' minutes?
EDIT: Timeout meaning scrape frequency (right now it's every second)
Yeah maybe only update if older than an hour or so? Maybe 30 minutes?
Also it looks like I get the issue with the above error or only getting 2 documents on any tracker. Most give the 400 error and only have 1 document in the ES index, the one that doesnt give the 400 error maxes out at 2 in the index.
Right, 30 minutes will be set to the default. (configurable though)
As for that last error, I believe I know what that is
Hold for some hotfixes
Okay...
Back again
@pmwoodward3 take a look at this version: https://github.com/AlphaReign/scraper/releases/tag/0.0.3
New things added / fixed
I've had it running all day while I was working and it looks like its running perfectly.
Random question, how easy would it be to put DHT portion on a Socks5 proxy? I saw that the binding for adding was 0.0.0.0:6881 so does that mean I could just setup a socks5 proxy and bind it to that and be in business?
Thanks
I think so, but am not exactly sure.
When I first started testing, I opened a tunnel to that port, but didn't have any luck with it exposed. A Socks proxy might work though since it would filter any communication at the port. The problem I see though is that, if you have a socks proxy, exposing the port on the socks proxy means that your ip:port would be socksIPAddress:socksPort which might not be able to receive DHT requests.
Once you are connected to the DHT network, you might be able to query for hashes, but you might not receive queries for hashes which means you won't pick up any torrents along the way
Might be easier to just use TOR or use a VPN and bind it to that IP address. Thoughts?
I think the difficulty lies in binding incoming communications from other DHT nodes. Not an easy way around that, that I know of
Hmm, something i'll look into. Thanks for being so responsive regardless! :)
No problem at all!
Ok, two more random questions, and only because i'm seriously in love with this project.
How did you get so many torrents into your original site? Did you have a script to import it from other sources? Would it be possible to somehow feed it a list of hashes from the DHT, something like this https://github.com/scriptzteam/BitTorrent_DHT_Logger_v1.1 and import all those torrents?
Would it be possible to run multiple instances of the scraper, using different tackers to speed up collection of torrents? Would there be any issues with multiple instances pointing to the same ES index?
Thanks :)
So, initially, I ran this on a digital ocean server. In two weeks or so, I had 3-4 million. You can speed it up by setting up multiple nodes all pointing to one elasticsearch server but you will need a beast of a machine for elasticsearch.
So 1) there is no import process yet and sometime soon)
2) yes you can. Use the same tracker though to have similar results. Just different ip addresses for each.
There might be an import process. But I don't know how soon.
Is there an htaccess or anything needed for apache2? When I click register I get a 404 url not found.