abresler / gdeltr2

modern gdelt wrapper for r
https://asbcllc.com/gdeltr2
Other
62 stars 14 forks source link

Is that possible to use data.table::fread to replace readr::read_tsv in gdeltr2::get_gdelt_url_data? #9

Open kevin820606 opened 3 years ago

kevin820606 commented 3 years ago

There are two problems aroused when I use gdeltr2::get_data_gdelt_periods_event() function.

The first one is much crucial, that it will falsely guess data type in some column. For example, if you download the zip file of 201303, the codeActor1 column will be guessing as Boolean because the top 1000 rows are empty, which should be a character. I know that can be solved by assign the type of each column in readr::read_tsv, and that will take lots of time to deal with but it could be a solution.

The second one is about efficiency. The same 201303 file mentioned above is around 620 MB, it takes several minutes to read on my computer (and crash because of lacking brackets after lubridate::ymd in line 1119). I have tried to use data.table::fread() to replace it, and I gain a considerable efficient boost. And the only problem I met was that I should add parameter data.table = FALSE to avoid changing data structure to data.table.

I am not an expert on package making, perhaps this is not a good way to solve problems. The second problem is not that important, but the first one I think is worth mentioned.