alex9smith / gdelt-doc-api

A Python client for the GDELT 2.0 Doc API
MIT License
91 stars 20 forks source link

JSONDecodeError #26

Open pdb159 opened 1 year ago

pdb159 commented 1 year ago

For certain dates i receive a JSONDecodeError as well as an AttributeError: 'ValueError' object has no attribute 'pos'. Does this mean there are no news articles available for the selected day and if yes is there a way to access GDELT directly to get the respective data for the date?

Thanks for the help!

networks1 commented 1 year ago

I just had the same thing happen:

Traceback (most recent call last):
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 15, in load_json
    result = json.loads(json_message)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 99103 (char 99102)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 15, in load_json
    result = json.loads(json_message)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
ValueError: Exceeds the limit (4300) for integer string conversion: value has 248854 digits; use sys.set_int_max_str_digits() to increase the limit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\boss\Dropbox (ASU)\merck grant\gdelt-search.py", line 74, in <module>
    new_articles = gd.article_search(f)
                   ^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\api_client.py", line 79, in article_search
    articles = self._query("artlist", filters.query_string)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\api_client.py", line 168, in _query
    return load_json(response.content, self.max_depth_json_parsing)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 27, in load_json
    return load_json(json_message=new_message, max_recursion_depth=max_recursion_depth,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\gdeltdoc\helpers.py", line 20, in load_json
    idx_to_replace = int(e.pos)
                         ^^^^^
AttributeError: 'ValueError' object has no attribute 'pos'
alex9smith commented 1 year ago

Thanks for the reports! I'll do some digging and figure this out

alex9smith commented 1 year ago

@networks1 @pdb159 could you give me an example query that gives this error?

networks1 commented 1 year ago

Running this should reproduce it. I can't remember the exact days. The first was in late September I think. There were a couple in December too.

date_generated = pd.date_range('2020-09-01','2020-12-31',freq ="D").strftime("%Y-%m-%d").tolist()
api_timeout = 5
for dt in date_generated:
    start_date = dt
    end_date = (datetime.strptime(dt,"%Y-%m-%d") + timedelta(days=1)).strftime("%Y-%m-%d")
    f = Filters(
        # keyword = kw,
        near = near(20,"COVID","vaccine"),
        start_date = start_date,
        end_date =  end_date,
        num_records = 250, 
        country = "US"
        )
    new_articles = gd.article_search(f)
    time.sleep(api_timeout)