algolia / docsearch-scraper

DocSearch - Scraper
https://docsearch.algolia.com/
Other
305 stars 106 forks source link

Record quota exceeded at 5k instead of 10k #545

Closed PierreR closed 3 years ago

PierreR commented 3 years ago

I have got the error algoliasearch.helpers.AlgoliaException: Record quota exceeded. Change plan or delete records. after 5k instead of 10.

I can't see anything suspicious in my config:

{
  "index_name": "cicd-docs",
  "start_urls": [
    "https://docs.cicd.cirb.lan/"
  ],
  "scrape_start_urls": false,
  "stop_urls": [],
  "selectors": {
    "lvl0": {
      "selector": "//nav[@class='breadcrumbs']//li[1]//a",
      "type": "xpath",
      "global": true
    },
    "lvl1": ".doc > h1.page",
    "lvl2": ".doc .sect1 > h2:first-child, .doc > h1.sect0",
    "lvl3": ".doc .sect2 > h3:first-child",
    "text": ".doc p, .doc dt, .doc td.context, .doc th.tableblock"
  },
  "selectors_exclude": [],
  "min_indexed_level": 1
}

I have had this error for a long time using version from v1.4.5 to the current master (I have tried all of them). I use the expected to launch the crawler:

docker run -it --env-file=.env-algolia -e "CONFIG=$(cat ./config.json | jq -r tostring)" algolia/docsearch-scraper

I had been trying to reduce the content of the site to mitigate the issue but I am reaching a point where it is harder and harder to do so.

Thanks for your help

shortcuts commented 3 years ago

Hi @PierreR,

I'm not able to try your config, as this is probably a local website.

Could you please contact us at docsearch@algolia.com with the concerned app id/search API key so I could try to reproduce?

Thanks