Open snarfed opened 8 years ago
right now I am collecting everything as discrete json files:
https://indie-stats.com/domains/{{ basedomain }}/processed.json contains a list of all poll results https://indie-stats.com/domains/{{ basedomain }}/{{ timestamp }}_{{ basedomain }}.json
for example:
processed.com
["20150311T082741_tantek.com.json", "20150224T083010_tantek.com.json"]
20150311T082741_tantek.com.json:
{
"status": 200,
"headers": {
"content-length": "26734",
"x-powered-by": "PHP/5.3.28",
"content-encoding": "gzip",
"vary": "Accept-Encoding, User-Agent",
"server": "LiteSpeed",
"connection": "close",
"date": "Wed, 11 Mar 2015 08:27:41 GMT",
"content-type": "text/html; charset=UTF-8"
},
"domain": "tantek.com",
"html": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n</head></html>"
"excluded": false,
"url": "http://Tantek.com/",
"polled": "2015-03-11T08:27:41Z",
"claimed": false,
"mf2": {},
"history": [ 200, 200 ],
}
where obviously the mf2 and html entries have a lot more data
you can now get the above using the beginnings of an api
first make the /api/v1/domains/
will add a tarball to the processing code
cool! thank you!
to expand on the initial description, here are concrete questions questions i'd love to be able to ask:
p-content
instead of e-content
.h-card
on their home page? show me the first 10.cool - i'll start chewing on this and ping you when I have the start of it
i often have arbitrary questions that i'd love to use this data to answer, e.g. how many sites have an h-card?, or how many people use PSCs/PSLs? any chance you might serve the data publicly? maybe a zip file per day and/or per site?