Large database size - Githubissues

covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution

https://covid19datahub.io

GNU General Public License v3.0

251 stars 93 forks source link

Large database size #172

Closed dherrera1911 closed 2 years ago

dherrera1911 commented 2 years ago

I'm finding trouble using the package now because the data doesn't fit in my memory, despite that I only want a subset. The download works fine when I download data using level=1, but if I want the data from NYC, for example, I have to use covid19(country="USA", level=3) which makes me run out of my 8GB of memory, even if I provide a short time interval in start/end date.

eguidotti commented 2 years ago

Can you try to set the argument cache = FALSE? This should save same space in memory.

It would be nice to ship the data in a sqlite database. This would be more efficient but will take some time before having it up and running

dherrera1911 commented 2 years ago

Thanks for the quick response! I tried the cache = FALSE and it still crashed my memory. I'll just try to increase my swap or look for that specific data elsewhere (I'm still using the package for non level 3 locations).

Best

eguidotti commented 2 years ago

It's strange. The data file is below 1 GB. Are you using the package on CRAN?

install.packages("COVID19")
library(COVID19)
x <- covid19("USA", level = 3, cache = FALSE)

Another option you can try is to read directly the data file.

x <- read.csv("https://storage.covid19datahub.io/rawdata-3.csv")

Otherwise you can manually download the data here

dherrera1911 commented 2 years ago

Yes, just using that fills my memory (I'm noticing it's partly filled, so maybe it's just 4 GB, but still). Thanks for the suggestions though!