algolia / docsearch-scraper

DocSearch - Scraper
https://docsearch.algolia.com/
Other
305 stars 106 forks source link

Docker operation error. Procedure #569

Closed pptfz closed 2 years ago

pptfz commented 2 years ago

System environment:CentOS7.6 docker version: 20.10.17

docker run -it --env-file=/tmp/.env -e "CONFIG=$(cat /tmp/config.json | jq -r tostring)" algolia/docsearch-scraper

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/src/index.py", line 119, in <module>
    run_config(environ['CONFIG'])
  File "/root/src/index.py", line 33, in run_config
    config = ConfigLoader(config)
  File "/root/src/config/config_loader.py", line 84, in __init__
    self._parse()
  File "/root/src/config/config_loader.py", line 120, in _parse
    self.selectors = SelectorsParser().parse(self.selectors)
  File "/root/src/config/selectors_parser.py", line 64, in parse
    if 'lvl0' in config_selectors:
TypeError: argument of type 'NoneType' is not iterable

cat /tmp/.env

APPLICATION_ID=xxx
API_KEY=xxx

cat /tmp/config.json

{
    "index_name": "xxx",
    "start_urls": [
        "https://xxx.com"
    ],
    "sitemap_urls": [
        "https://xxx.com/docs"
    ]
}

How do we do that?

pptfz commented 2 years ago

Is this project still maintained? If not, delete it

shortcuts commented 2 years ago

Is this project still maintained? If not, delete it

Please read https://github.com/algolia/docsearch-scraper#deprecated. Also make sure to provide context and as much debugging steps as possible when opening issues.

cat /tmp/config.json

Is that all you have in your config? make it match what we recommend here https://docsearch.algolia.com/docs/legacy/run-your-own#create-a-new-configuration, you need selectors

I'm closing this issue as it seems that the documentation provides answers to your question.