andybega / icews

Get the ICEWS event data
https://www.andybeger.com/icews/
Other
23 stars 2 forks source link

DB state does not include source files with all duplicate events #47

Closed andybega closed 5 years ago

andybega commented 5 years ago

When adding events to the database, events with an already existing event ID are not added again. If all events in a ".tsv" source file are duplicates and thus none are added to the "events" table in the DB, the name of the source file is stored in the "null_source_files" table. This is because the "source_files" table is created in reference to the "source_file" column in the "events" table, and thus those files wouldn't show up. The DB state getter does not include the null source files.

andybega commented 5 years ago

This fix actually was wrong. There are SQL triggers that update the source files by pulling from "null_source_files" anytime either "source_files" or "null_source_files" is changed. Running update_stats() from R triggers one of them.

The problem was again that write_data_to_db() did not treat the input as a daily file, thus not triggering a write to "null_source_files". Undo this.