alex9smith / gdelt-doc-api

A Python client for the GDELT 2.0 Doc API
MIT License
91 stars 20 forks source link

requests with timespan based on minutes fail #17

Closed priolap closed 2 years ago

priolap commented 2 years ago

Hi @alex9smith,

I text you again about another issue I've recently found while working with your library. According to the gdelt documentation, one can choose the time range by using this option to specify the number of months, weeks, days, hours or minutes (minimum of 15 minutes).

It works for all specifications except for the "minutes" one. I tried this way:

f = Filters( timespan = "15min", theme = 'CORRUPTION' repeat = repeat(3, 'italy') ) gd = GdeltDoc()

news = gd.article_search(f)

It gives as output: TypeError: argument of type 'int' is not iterable

Am I doing something wrong?

alex9smith commented 2 years ago

Thanks @priolap, I get the same error.

I think this is similar to the last couple of problems that you've reported, and the query string that's getting built for those filters isn't correct. I will investigate and sort out a fix.

What this also tells me is:

  1. I need to check through all the other filter options and write some more tests based on the documentation, as the test coverage is pretty poor at the moment.
  2. I need a better error message if the query string is invalid.

I'll get your problem fixed first, then look at some longer term fixes to reduce the number of times people get similar problems.

alex9smith commented 2 years ago

This is a really unfortunate error 😞

Even though the documentation says you can query the last 15 minutes, my testing shows the smallest value allowed in the timespan parameter is actually 60 minutes.

The error message you're getting here was really unhelpful though (sorry!). I've just released version 1.4.0 which has validation of the timespan parameter and better error handling so if there's another case that produces an invalid API query you'll at least see a proper error message.

On 1.4.0 running your example results in:

In [1]: from gdeltdoc import GdeltDoc, Filters, repeat
   ...:
   ...: f = Filters(
   ...: timespan = "15min",
   ...: theme = 'CORRUPTION',
   ...: repeat = repeat(3, 'italy')
   ...: )
   ...: gd = GdeltDoc()
   ...:
   ...: news = gd.article_search(f)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 3>()
      1 from gdeltdoc import GdeltDoc, Filters, repeat
----> 3 f = Filters(
      4 timespan = "15min",
      5 theme = 'CORRUPTION',
      6 repeat = repeat(3, 'italy')
      7 )
      8 gd = GdeltDoc()
     10 news = gd.article_search(f)

File ~/code/gdelt-doc-api/gdeltdoc/filters.py:169, in Filters.__init__(self, start_date, end_date, timespan, num_records, keyword, domain, domain_exact, near, repeat, country, theme)
    166     self.query_params.append(f'&enddatetime={end_date.replace("-", "")}000000')
    167 else:
    168     # Use timespan
--> 169     self._validate_timespan(timespan)
    170     self.query_params.append(f"&timespan={timespan}")
    172 if num_records > 250:

File ~/code/gdelt-doc-api/gdeltdoc/filters.py:268, in Filters._validate_timespan(timespan)
    265     raise ValueError(f"Timespan {timespan} is invalid. {value} could not be converted into an integer")
    267 if unit == "min" and int(value) < 60:
--> 268     raise ValueError(f"Timespan {timespan} is invalid. Period must be at least 60 minutes")

ValueError: Timespan 15min is invalid. Period must be at least 60 minutes

This doesn't fix what you were trying to do, but at least we know why now.

priolap commented 2 years ago

Hi @alex9smith,

thanks for your testing but currently I can query the last 31 minutes!

alex9smith commented 2 years ago

Oh, interesting. Can you share the query you did that worked for 31 minutes?

priolap commented 2 years ago

@alex9smith

from gdeltdoc import GdeltDoc, Filters, repeat

f = Filters( timespan='31min', theme = 'CORRUPTION', # GKG codes repeat = repeat(5, "italy") )

gd.article_search(f)

obviously it works with the previous version (1.3.2)