Open simonw opened 3 years ago
Prototype:
curl 'https://hn.algolia.com/api/v1/items/27941108' \
| jq '[recurse(.children[]) | del(.children)]' \
| sqlite-utils insert hn.db items - --pk id
If you hit the endpoint for a comment that's part of a thread you get that comment and its recursive children: https://hn.algolia.com/api/v1/items/27941552
You can tell that it's not the top-level because the parent_id
isn't null
. You can use story_id
to figure out what the top-level item is.
{
"id": 27941552,
"created_at": "2021-07-24T15:08:39.000Z",
"created_at_i": 1627139319,
"type": "comment",
"author": "nine_k",
"title": null,
"url": null,
"text": "<p>I wish ...",
"points": null,
"parent_id": 27941108,
"story_id": 27941108
}
Got a TIL out of this: https://til.simonwillison.net/jq/extracting-objects-recursively
The
trees
command currently has to make a request for every single comment. Algolia have an endpoint that bundles the entire thread together into a single request.https://hn.algolia.com/api/v1/items/ID
Here's an example that loads quickly, with about 50 comments: https://hn.algolia.com/api/v1/items/27941108
It doesn't appear to use pagination at all - if a thread is big then the response is big.
I ran this search to find some stories with more than 1000 comments: https://hn.algolia.com/api/v1/search?tags=story&numericFilters=num_comments%3E=1000
Here's one: https://news.ycombinator.com/item?id=25015967 with 4759 comments. Hitting the API takes 41s and returns 3.7 MB of JSON!