bryankolano / gdelt_pipeline_google_cloud

To get some practice with the ETL tool "Prefect", this repo grabs events from the Global Database of Events, Language, and Tone (GDELT) and move them through an ETL pipeline to eventually end in Google Big Query.
0 stars 0 forks source link

Data before 2017 #1

Closed completelyboofyblitzed closed 1 year ago

completelyboofyblitzed commented 1 year ago

Hey! I was just curious if it can grab the data before the year of 2017, aka the GDELT v.1 database and how far behind if it does.

bryankolano commented 1 year ago

Hi! Version 2 of the data goes back to February, 2015. Here is the listing of each data file.

http://data.gdeltproject.org/gdeltv2/masterfilelist.txt

Version 1 of GDELT goes back longer:

http://data.gdeltproject.org/events/index.html

In April 2013, they started doing the daily GDELT roll up. It looks like before that, files don't exist for each day, but rather as a summary for each month_year.

completelyboofyblitzed commented 1 year ago

Thank you!