bear / indie-stats

Indieweb site crawler and MF2 data collection tool
MIT License
11 stars 1 forks source link

add Twitter tracking #6

Open bear opened 9 years ago

bear commented 9 years ago

integrate the irc-people page along with voxpelli's rel spider to add that data to ours

voxpelli commented 9 years ago

Steps to success:

  1. Host the relspider somewhere – I myself haven't yet got a stable one up as it hits the limits of Heroku free instances fairly easily + has a pretty aggressive recrawling right now (all pages are recrawled daily) – but I will get one up eventually so just shout if you want an instance and I'll get you one.
  2. Call https://github.com/voxpelli/relspider#apilookup, with callback webhook if possible as that ensures data is returned even if the crawling takes some time due to a large identity graph
  3. Find any Twitter profiles or other interesting profiles among the returned data
  4. File issues against the relspider projects about ways it can improve and make things easier
  5. Profit!
bear commented 9 years ago

@voxpelli is the spider's code someplace where I could try to host it on my server?

voxpelli commented 9 years ago

@bear Yes: https://github.com/voxpelli/relspider It should be fairly plug & play to get it up on Heroku, one just needs to add the addons for the Postgres database and the Neo4j (GrapheneDB I believe) and then set up the database through heroku run npm run install-schema

If one hosts it oneself then one has to spin up the Postgres and Neo4j servers oneself and add the environment variables DATABASE_URL and NEO4J_URL – and run npm run install-schema after that to set up the database.

voxpelli commented 9 years ago

But preferably the crawler should be hosted standalone so it can be shared among many services – that has always been my intent, but so far haven't had any real use case for it (after the original one I had when I created it but which later I never got to use it for).

So I'm happy to pull up an instance of it for this IndieWeb purpose.