There are two problems aroused when I use gdeltr2::get_data_gdelt_periods_event() function.
The first one is much crucial, that it will falsely guess data type in some column. For example, if you download the zip file of 201303, the codeActor1 column will be guessing as Boolean because the top 1000 rows are empty, which should be a character. I know that can be solved by assign the type of each column in readr::read_tsv, and that will take lots of time to deal with but it could be a solution.
The second one is about efficiency. The same 201303 file mentioned above is around 620 MB, it takes several minutes to read on my computer (and crash because of lacking brackets after lubridate::ymd in line 1119). I have tried to use data.table::fread() to replace it, and I gain a considerable efficient boost. And the only problem I met was that I should add parameter data.table = FALSE to avoid changing data structure to data.table.
I am not an expert on package making, perhaps this is not a good way to solve problems. The second problem is not that important, but the first one I think is worth mentioned.
There are two problems aroused when I use
gdeltr2::get_data_gdelt_periods_event()
function.The first one is much crucial, that it will falsely guess data type in some column. For example, if you download the zip file of 201303, the
codeActor1
column will be guessing as Boolean because the top 1000 rows are empty, which should be a character. I know that can be solved by assign the type of each column inreadr::read_tsv
, and that will take lots of time to deal with but it could be a solution.The second one is about efficiency. The same 201303 file mentioned above is around 620 MB, it takes several minutes to read on my computer (and crash because of lacking brackets after lubridate::ymd in line 1119). I have tried to use
data.table::fread()
to replace it, and I gain a considerable efficient boost. And the only problem I met was that I should add parameterdata.table = FALSE
to avoid changing data structure to data.table.I am not an expert on package making, perhaps this is not a good way to solve problems. The second problem is not that important, but the first one I think is worth mentioned.