ecohealthalliance / open-rvfcast

Wellcome Open RVFCast project repository
Other
0 stars 2 forks source link

Additional data layers and feature engineering #94

Open emmamendelsohn opened 3 months ago

emmamendelsohn commented 3 months ago
n8layman commented 2 months ago

@emmamendelsohn is this similar to the outbreak history layer @noam started in that immunity is inferred from previous outbreak case data? Or is the immunity layer a separate seroprevalence dataset as suggested in #79? If based on case counts this will be included in PR #76 which contains daily outbreak histories in both the short and long term. In these histories, once an outbreak occurs the impact has an exponential decline, both spatially and as time progresses.

As an aside, SpatRasters produced with terra::writeRaster(..., gdal=c("COMPRESS=LZW")) were nearly half the size as when saving long-form data as parquet using compression = "gzip", compression_level = 5. Both .parquet and .tif versions are in the data/outbreak_history_dataset/ folder.

recent outbreaks

old outbreaks

emmamendelsohn commented 2 months ago

Yes immunity layer is Noam's outbreak layer.

We primarily chose parquets for the ability to interact with the data outside of memory using arrow. Smaller rasters are good but before making any switch, make sure you can do all the data processing on them.

n8layman commented 2 months ago

I don't plan on switching, particularly for the full dataset tibble but it was interesting to observe and something to think about for future projects. Would tiffs avoid the parquet issue with AWS and targets?

n8layman commented 2 months ago

@kevinolival @rostal