algolia / docsearch-scraper

DocSearch - Scraper
https://docsearch.algolia.com/
Other
308 stars 107 forks source link

Getting "ValueError: CONFIG is not a valid JSON" when running config file #333

Closed stevenbennitt closed 7 years ago

stevenbennitt commented 7 years ago

Trying to run my config file and I'm getting the error CONFIG is not a valid JSON

$ ./docsearch run /bigcommerce.json
Traceback (most recent call last):
  File "./docsearch", line 5, in <module>
    run ()
  File "C:\cygwin64\home\steven.bennett\cli\src\index.py", line 185, in run
    exit(command.run(sys.argv[2:]))
  File "C:\cygwin64\home\steven.bennett\cli\src\commands\run_config.py", line 20, in run
    return run_config(args[0])
  File "C:\cygwin64\home\steven.bennett\cli\..\scraper\src\index.py", line 27, in run_config
    config = ConfigLoader(config)
  File "C:\cygwin64\home\steven.bennett\cli\..\scraper\src\config\config_loader.py", line 62, in __init__
    data = self._load_config(config)
  File "C:\cygwin64\home\steven.bennett\cli\..\scraper\src\config\config_loader.py", line 94, in _load_config
    raise ValueError('CONFIG is not a valid JSON')
ValueError: CONFIG is not a valid JSON

Not sure what I'm missing.

Here's my config

{
  "index_name": "TESTING_INDEX",
  "start_urls": [
    "https://support.bigcommerce.com/",
    "https://support.bigcommerce.com/guides",
    "https://support.bigcommerce.com/university",
    "https://support.bigcommerce.com/documentation"
  ],
  "stop_urls": [
    "https://support.bigcommerce.com/guides$",
    "https://support.bigcommerce.com/university$",
    "https://support.bigcommerce.com/documentation$",
    "\\?",
    "%20",
    "/$"
  ],
   "selectors_exclude": [
    "#feedback-visible",
    "#menu-title",
    "#related-articles",
    ".tip"
  ], 
  "selectors": {
    "lvl0": {
      "selector": "//ol[contains(@class, 'breadcrumb')]/li[2]/a",
      "type": "xpath"
    },
    "lvl1": ".categories > li.active > a",
    "lvl2": ".content h1",
    "lvl3": ".content h3",
    "lvl4": ".content h4",
    "text": ".content p, .content li"
  },
  "min_indexed_level": 2,
  "scrap_start_urls": false,
  "nb_hits": 24820
}
ElPicador commented 7 years ago

The error you have ValueError: CONFIG is not a valid JSON says that you file is not a JSON.

Probably you mean ./bigcommerce.json and not /bigcommerce.json. There is a dot missing.

/filename will try to find a file at the root of your drive, not in the current directory.

stevenbennitt commented 7 years ago

Yep, that was it just had the file path wrong.

KaranS-hexaware commented 2 years ago

Facing the same error

Traceback (most recent call last): File "/root/src/config/config_loader.py", line 101, in _load_config data = json.loads(config, object_pairs_hook=OrderedDict) File "/usr/lib/python3.6/json/init.py", line 367, in loads return cls(**kw).decode(s) File "/usr/lib/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/src/index.py", line 119, in run_config(environ['CONFIG']) File "/root/src/index.py", line 33, in run_config config = ConfigLoader(config) File "/root/src/config/config_loader.py", line 69, in init data = self._load_config(config) File "/root/src/config/config_loader.py", line 106, in _load_config raise ValueError('CONFIG is not a valid JSON') ValueError: CONFIG is not a valid JSON

This is my command:docker run -it --env-file=.env -e "CONFIG=$(cat ./config.json | jq -r tostring)" algolia/docsearch-scraper

Any suggestions?

trevorpfiz commented 1 year ago

I just got this as well. Did you figure it out?

trevorpfiz commented 1 year ago

I got it after finding this issue.