I have made the documentation site for the company I work for using Docusaurus. I integrated Algolia's Docsearch with it and I'm having some strange python errors when our CI runs the scraper.
These errors do not happen consistently, sometimes the CI runs the scrapper successfully. The website we scrap is https://docs.surfly.com and the config file I'm using for the crawler is the following
If anyone has some tip of what could be going wrong it would really help me. So far googling these errors didn't help much as the stack trace doesn't reference any of the scraper source files, they seem more like python errors so I end up looking at random results in google.
Here are the stack traces when the CI fails running the job.
I have made the documentation site for the company I work for using Docusaurus. I integrated Algolia's Docsearch with it and I'm having some strange python errors when our CI runs the scraper.
These errors do not happen consistently, sometimes the CI runs the scrapper successfully. The website we scrap is https://docs.surfly.com and the config file I'm using for the crawler is the following
DocSearch config
```JSON { "index_name": "surfly-docs", "start_urls": [ "https://docs.surfly.com/" ], "sitemap_urls": [ "https://docs.surfly.com/sitemap.xml" ], "sitemap_alternate_links": true, "stop_urls": [ "/tests" ], "selectors": { "lvl0": { "selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]", "type": "xpath", "global": true, "default_value": "Documentation" }, "lvl1": "header h1", "lvl2": "article h2", "lvl3": "article h3", "lvl4": "article h4", "lvl5": "article h5, article td:first-child", "lvl6": "article h6", "text": "article p, article li, article td:last-child" }, "strip_chars": " .,;:#", "custom_settings": { "separatorsToIndex": "_", "attributesForFaceting": [ "language", "version", "type", "docusaurus_tag" ], "attributesToRetrieve": [ "hierarchy", "content", "anchor", "url", "url_without_anchor", "type" ] }, "conversation_id": [ "833762294" ], "nb_hits": 46250 } ```If anyone has some tip of what could be going wrong it would really help me. So far googling these errors didn't help much as the stack trace doesn't reference any of the scraper source files, they seem more like python errors so I end up looking at random results in google.
Here are the stack traces when the CI fails running the job.
The command is:
podman run --rm --env-file=.env -e "CONFIG=$(cat ./docsearch-config.json | jq -r tostring)" algolia/docsearch-scraper
Error stack traces:
unsupported operand type
``` [ 2ms] > Running command: podman run --rm --env-file=.env -e "CONFIG=$(cat ./docsearch-config.json | jq -r tostring)" algolia/docsearch-scraper [ 396ms] Traceback (most recent call last): [ 396ms] File "/usr/local/bin/pipenv", line 7, inSystemError: unknown opcode
``` [ 3ms] > Running command: podman run --rm --env-file=.env -e "CONFIG=$(cat ./docsearch-config.json | jq -r tostring)" algolia/docsearch-scraper [ 389ms] XXX lineno: 774, opcode: 163 [ 391ms] Traceback (most recent call last): [ 391ms] File "/usr/local/bin/pipenv", line 7, inAttributeError: 'Environment' object has no attribute 'scan'
``` [ 2ms] > Running command: podman run --rm --env-file=.env -e "CONFIG=$(cat ./docsearch-config.json | jq -r tostring)" algolia/docsearch-scraper [ 677ms] Traceback (most recent call last): [ 677ms] File "/usr/local/bin/pipenv", line 11, in