alex9smith / gdelt-doc-api

A Python client for the GDELT 2.0 Doc API
MIT License
91 stars 20 forks source link

Return full article text in Python client? #21

Closed eelegiap closed 2 years ago

eelegiap commented 2 years ago

Hi, I was wondering if there is a way to query article text using this tool, or is it a way to easily add it to this Python client. Or do you recommend using the returned URL for further web scraping? Thank you!

alex9smith commented 2 years ago

Hi, unfortunately there's no easy way to get the article text with this client - it's not an available field in the DOC API that the client's calling.

I think there are two sensible options - you can either scrape the returned URLs as you said, or you could use the public BigQuery tables to search for URLs and return the text. I'm not very familiar with what's in the BigQuery datasets, but I expect one of the tables has the article data.