algolia / hn-search

Hacker News Search
http://hn.algolia.com
Other
549 stars 74 forks source link

HN Search API limits number of hits to 1000, regardless of `page` parameter #230

Open harabat opened 2 years ago

harabat commented 2 years ago

I am trying to fetch all stories posted in a given period. I expected to be able to get all 5k results, but am only able to get 1k.

This limit is not made explicit on the HN Search API reference.

The issue has already been raised in #125, where using the page parameter was suggested as a workaround: this no longer works.

The issue also has also been mentioned in a StackOverflow question, with no answer specific to Algolia's HN Search API.

This might be expected behaviour, but it is not documented anywhere as far as I know.


My query:

http://hn.algolia.com/api/v1/search_by_date?tags=story&numericFilters=created_at_i%3E1661122800.0,created_at_i%3C1661727600.0&hitsPerPage=100

The output for page 9 of results:

{
"hits":[...]
"nbHits":5562,
"page":9,
"nbPages":10,
"hitsPerPage":100,
"exhaustiveNbHits":true,
"exhaustiveTypo":true,
"query":"",
"params":"advancedSyntax=true&analytics=true&analyticsTags=backend&hitsPerPage=100&numericFilters=created_at_i%3E1661122800.0%2Ccreated_at_i%3C1661727600.0&page=9&tags=story",
"processingTimeMS":5,
"processingTimingsMS":{...}
}

The output for page 10 of results:

{
  "hits": [],
  "page": 10,
  "nbHits": 0,
  "nbPages": 0,
  "hitsPerPage": 100,
  "exhaustiveNbHits": true,
  "exhaustiveTypo": true,
  "exhaustive": {
    "nbHits": true,
    "typo": true
  },
  "processingTimeMS": 1,
  "message": "you can only fetch the 1000 hits for this query. You can extend the number of hits returned via the paginationLimitedTo index parameter or use the browse method. You can read our FAQ for more details about browsing: https://www.algolia.com/doc/guides/sending-and-managing-data/manage-your-indices/how-to/export-an-algolia-index/#exporting-the-index-using-an-api-client",
  "query": "",
  "params": "advancedSyntax=true&analytics=true&analyticsTags=backend&hitsPerPage=100&numericFilters=created_at_i%3E1661122800.0%2Ccreated_at_i%3C1661727600.0&page=10&tags=story"
}
AleksandarJeftic commented 2 years ago

This is making api useless for my project, where I have to fetch all hits.

cmgchess commented 2 years ago

i guess this is because paginationLimitedTo is set to 1000 as default. and to get more than 1000 you will need to use browse instead of search where you will also need access to a key with browse capability afaik.

harabat commented 2 years ago

This is making api useless for my project, where I have to fetch all hits.

My workaround was to write a script that splits whatever period I'm querying day by day (so a for loop that queries Mon - Sun instead of a full week).

harabat commented 2 years ago

i guess this is because paginationLimitedTo is set to 1000 as default. and to get more than 1000 you will need to use browse instead of search where you will also need access to a key with browse capability afaik.

Thanks @cmgchess for looking into this, I had found that resource before posting, but the use of that endpoint seems to be for Algolia's customers really: it's unlikely that all those trying to query HN Search API could request such a key, especially if the key needs to be renewed every X weeks.

My workaround (https://github.com/algolia/hn-search/issues/230#issuecomment-1304595217) is fine for me for now, but I thought I'd keep the issue open as this is still unexpected and undocumented behaviour (as demonstrated by my sources).