KunstDerFuge / Q-notebook

14 stars 1 forks source link

Missing drop thread IDs #2

Closed ctrlcctrlv closed 3 years ago

ctrlcctrlv commented 3 years ago

Thread IDs are missing. I got the data for you, please add to Pandas. Should be easy to do regex on the links to make them into a 4-tuple of (post ID, thread ID, site name, board) (all should be kept to reconstitute 4chan/8chan URLs)

I got them with this code:

// load jquery yourself into firefox console, 
// and also this SO answer: https://stackoverflow.com/a/190255/1901658 (jQuery :regex)
// then, on qanon pub index2, do:
function * zip(arr1, arr2, i = 0) {
  while(arr1[i] || arr2[i]) yield [arr1[i], arr2[i++]].filter(x => !!x);
}

var ids = $("article a:regex(id, ^\\d+$)");
var nos = $("header a[id=archive_today]").not("blockquote a");

var Z=new Array(...zip(ids, nos));
var Z2=new Array; for (i = 0; i < Z.length; i++) {Z2.push([$(Z[i][0]).attr("href").slice(1), $(Z[i][1]).attr("href")]);}
var S = ""; for (i = 0; i < Z2.length; i++) {S+=(Z2[i][0] + "\t" + Z2[i][1] + "\n")}

This is for a project I want to do, scraping all of the URLs containing Q posts, so people like QOrigins can answer questions like "what is the first time someone in a Q thread said X"?

I am doing this to answer a thoughtful Twitter question (although perhaps not asked in good faith): https://twitter.com/sallutephilipe/status/1384234237394493443

The first question I want to answer is: when did a /pol/ Anon first accuse QAnon of being Mossad, KGB, FSB, JIDF, JDF, "Jews", or other similar words?

Tab-separated values: https://gist.github.com/ctrlcctrlv/b334458c42911bdc6d4d93dc21159f2c

KunstDerFuge commented 3 years ago

Thank you, Fred! Fixed this by adding requested thread and post IDs.