alex9smith / gdelt-doc-api

A Python client for the GDELT 2.0 Doc API
MIT License
100 stars 23 forks source link

How to get all matched Articles? #5

Closed ddonng closed 3 years ago

ddonng commented 3 years ago

Thanks for your work, It's really good.

I found the Articles which GDELT DOC API matched query with ArtList mode returned doesn't seem to be all, especially when compared with the volume value using TimelineVolRaw.

How to get all matched Articles return? Any suggestion would be favor. Thanks.

alex9smith commented 3 years ago

Hey, could you post some example queries you're using in ArtList and TimelineVolRaw modes?

alex9smith commented 3 years ago

I've done some more investigating and unfortunately the limit of 250 matched articles in an ArtList search is set on GDELT's side and isn't a limitation of this library.

You can see this for yourself in their documentation:

"MAXRECORDS. This option only applies to the ArticleList and various ImageCollage modes, it is ignored in all other modes. To conserve system resources, in Article List and the ImageCollage modes, the API only returns up 75 results by default, but this can be increased up to 250 results if desired by using this URL parameter."

or by constructing an API call yourself and requesting more than 250 articles - the below query requests 1000 and gets an error message back:

https://api.gdeltproject.org/api/v2/doc/doc?query=%22paypal%20%22&timespan=7d&maxrecords=1000&mode=artlist&format=json

I keep an eye on GDELT's blog posts, so if they ever increase the limit I'll update this library to match but it's not possible at the moment to get more than 250 articles back in an ArtList search.