Closed bryce-carson closed 3 months ago
I'm working on validating the refactoring of this file.
Part of the refactoring has stalled at resolving a semantic error I introduced when I optimized the wtd_nbrs_sum
function. To ease discussion, I won't refer to this function by its old name, given I renamed its identifier: transmissionLikelihoodWeightings
. This function returns a matrix.
@ashokkrish, please provide your input on the following two functions and their documentation.
library(tidyverse)
##' @title Average Euclidean Distance
##' @description Measure the exposure influence upon susceptible individuals at
##' spatial position (x, y) of the infectious individuals at a spatial
##' position (u, v).
##' @details The mathematical modelling of human or non-human mobility patterns
##' is non-trivial. This function is a limited implementation of the effective
##' area for only human mobility, with distances travelled per day (Ξ»)
##' measured in kilometers.
##'
##' NOTE: Our weight function does not take the same arguments as shown in the
##' slideshow: (π€ x y u v); The term βeffective areaβ comes from the fact that
##' if the kernel is constant on a finite disk (and zero outside it), then the
##' formula due to Bolker (1999) gives the area of the disk.
##'
##' See the article titled *A clarification of transmission terms in
##' host-microparasite models by Begon et al.* (2002).
##'
##' The raster data of infection counts or disease incidence provided to the
##' function which calls this one, transmissionLikelihoodWeightings, may be
##' aggregated by a given factor. That factor must be passed to this function
##' for parity, so the data is treated the same.
##' @param radius TODO: a fixed radius r > lambda; see details.
##' @param lambda movemenet distance (in kilometers) per day; see details.
##' @param aggregationFactor the degree of aggregation applied to the raster
##' data mentioned in the function details.
##' @returns a matrix of the average Euclidean distances
##' @author Bryce Carson
##' @author Thomas White
avgEuclideanDistance <- function(radius, lambda, aggregationFactor) {
## NOTE: βI have a philosophical and geometric question about these
## identities: what do they imply about the dimension and magnitude of the
## raster which the weighted number sum will be used with? Can we rid the
## function of the radius argument and only accept the lambda argument, given
## these identities and the claim made about the calculation of radius in the
## slides?β β Bryce
if (radius > lambda)
stopifnot(r == round((lambda - aggregationFactor) / aggregationFactor) + 1)
else
stopifnot(r == lambda + aggregationFactor)
len <- seq_len(1 + radius * 2)
df <- tidyr::expand(tibble(i = len, j = len), i, j)
avg.euc.dist <- function(i, j) {
exp(-sqrt(sum((c(i, j) - c(radius + 1, radius + 1))^2)) / lambda)
}
mutate(df, averageEuclideanDistance = map2_dbl(i, j, avg.euc.dist)) %>%
as.matrix(byrow = TRUE, ncol = nrow(df))
}
##' @title Weighted Sums
##' @description Calculate a matrix of weights respecting human mobility
##' patterns.
##' @details The pattern of human mobility used is described in a slideshow
##' here:
##' https://docs.google.com/presentation/d/1_gqcEh4d8yRy22tCZkU0MbGYSsGu3Djh/edit?usp=sharing&ouid=102231457806738400087&rtpof=true&sd=true.
##' @param infections a matrix of the count of infections per aggregate area in
##' a raster of terrestrial data.
##' @param radius a constant; see details.
##' @param lambda movemenet distance (in kilometers) per day; see details.
##' @returns a matrix of weightings for the calculation of the proportion of
##' exposed individuals who will become infectious.
##' @author Bryce Carson
##' @author Thomas White
##' @examples
##' terra::as.matrix(Infected, wide = TRUE) %>%
##' transmissionLikelihoodWeightings(30, 15, 10)
transmissionLikelihoodWeightings <-
function(infections, radius, lambda, aggregationFactor) {
inputMatrixRows <- nrow(infections)
inputMatrixCols <- ncol(infections)
diameter <- 2 * radius
temp.1 <- matrix(data = 0, nrow = inputMatrixRows, ncol = radius)
temp.2 <- matrix(data = 0, nrow = radius, ncol = diameter + inputMatrixCols)
input.modified <- rbind(temp.2, cbind(temp.1, infections, temp.1), temp.2)
print(dim(input.modified))
weights <- avgEuclideanDistance(radius, lambda)
list(i = seq_len(length.out = inputMatrixRows),
j = seq_len(length.out = inputMatrixCols)) %>%
expand.grid() %>%
as_tibble() %>%
mutate(weightedSum =
map2_dbl(i, j, function(i, j) {
neighbours <- input.modified[i:(i + diameter), j:(j + diameter)]
## MAYBE FIXME: are we certain we don't want matrix
## multiplication? %*%
## FIXME: non-conformable arrays
## Caused by error in `map2_dbl()`:
## βΉ In index: 1.
## Caused by error in `subset * weights`:
## ! non-conformable arrays
## matrix(1, 3, 3) * matrix(2, 9, 3)
## Error in matrix(1, 3, 3) * matrix(2, 9, 3) : non-conformable arrays
products <- neighbours * weights
sum(products)
})) %>%
as.matrix(byrow = TRUE, nrow = inputMatrixRows, ncol = inputMatrixCols)
}
This is an important function, as it represents a large portion of the clock time when a simulation runs.
[bryce@fedora]~/src/r/spatialEpisim/R% scc --no-cocomo ~/src/r/spatialEpisim.stable/R/rasterSimulation_DA.R
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Language Files Lines Blanks Comments Code Complexity
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
R 1 935 179 452 304 17
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total 1 935 179 452 304 17
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Processed 37494 bytes, 0.037 megabytes (SI)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[bryce@fedora]~/src/r/spatialEpisim/R% scc --no-cocomo sveird.R
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Language Files Lines Blanks Comments Code Complexity
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
R 1 353 54 97 202 10
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total 1 353 54 97 202 10
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Processed 14295 bytes, 0.014 megabytes (SI)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Part of the refactoring has stalled at resolving a semantic error I introduced when I optimized the
wtd_nbrs_sum
function. To ease discussion, I won't refer to this function by its old name, given I renamed its identifier:transmissionLikelihoodWeightings
. This function returns a matrix.@ashokkrish, please provide your input on the following two functions and their documentation.
library(tidyverse) ##' @title Average Euclidean Distance ##' @description Measure the exposure influence upon susceptible individuals at ##' spatial position (x, y) of the infectious individuals at a spatial ##' position (u, v). ##' @details The mathematical modelling of human or non-human mobility patterns ##' is non-trivial. This function is a limited implementation of the effective ##' area for only human mobility, with distances travelled per day (Ξ») ##' measured in kilometers. ##' ##' NOTE: Our weight function does not take the same arguments as shown in the ##' slideshow: (π€ x y u v); The term βeffective areaβ comes from the fact that ##' if the kernel is constant on a finite disk (and zero outside it), then the ##' formula due to Bolker (1999) gives the area of the disk. ##' ##' See the article titled *A clarification of transmission terms in ##' host-microparasite models by Begon et al.* (2002). ##' ##' The raster data of infection counts or disease incidence provided to the ##' function which calls this one, transmissionLikelihoodWeightings, may be ##' aggregated by a given factor. That factor must be passed to this function ##' for parity, so the data is treated the same. ##' @param radius TODO: a fixed radius r > lambda; see details. ##' @param lambda movemenet distance (in kilometers) per day; see details. ##' @param aggregationFactor the degree of aggregation applied to the raster ##' data mentioned in the function details. ##' @returns a matrix of the average Euclidean distances ##' @author Bryce Carson ##' @author Thomas White avgEuclideanDistance <- function(radius, lambda, aggregationFactor) { ## NOTE: βI have a philosophical and geometric question about these ## identities: what do they imply about the dimension and magnitude of the ## raster which the weighted number sum will be used with? Can we rid the ## function of the radius argument and only accept the lambda argument, given ## these identities and the claim made about the calculation of radius in the ## slides?β β Bryce if (radius > lambda) stopifnot(r == round((lambda - aggregationFactor) / aggregationFactor) + 1) else stopifnot(r == lambda + aggregationFactor) len <- seq_len(1 + radius * 2) df <- tidyr::expand(tibble(i = len, j = len), i, j) avg.euc.dist <- function(i, j) { exp(-sqrt(sum((c(i, j) - c(radius + 1, radius + 1))^2)) / lambda) } mutate(df, averageEuclideanDistance = map2_dbl(i, j, avg.euc.dist)) %>% as.matrix(byrow = TRUE, ncol = nrow(df)) } ##' @title Weighted Sums ##' @description Calculate a matrix of weights respecting human mobility ##' patterns. ##' @details The pattern of human mobility used is described in a slideshow ##' here: ##' https://docs.google.com/presentation/d/1_gqcEh4d8yRy22tCZkU0MbGYSsGu3Djh/edit?usp=sharing&ouid=102231457806738400087&rtpof=true&sd=true. ##' @param infections a matrix of the count of infections per aggregate area in ##' a raster of terrestrial data. ##' @param radius a constant; see details. ##' @param lambda movemenet distance (in kilometers) per day; see details. ##' @returns a matrix of weightings for the calculation of the proportion of ##' exposed individuals who will become infectious. ##' @author Bryce Carson ##' @author Thomas White ##' @examples ##' terra::as.matrix(Infected, wide = TRUE) %>% ##' transmissionLikelihoodWeightings(30, 15, 10) transmissionLikelihoodWeightings <- function(infections, radius, lambda, aggregationFactor) { inputMatrixRows <- nrow(infections) inputMatrixCols <- ncol(infections) diameter <- 2 * radius temp.1 <- matrix(data = 0, nrow = inputMatrixRows, ncol = radius) temp.2 <- matrix(data = 0, nrow = radius, ncol = diameter + inputMatrixCols) input.modified <- rbind(temp.2, cbind(temp.1, infections, temp.1), temp.2) print(dim(input.modified)) weights <- avgEuclideanDistance(radius, lambda) list(i = seq_len(length.out = inputMatrixRows), j = seq_len(length.out = inputMatrixCols)) %>% expand.grid() %>% as_tibble() %>% mutate(weightedSum = map2_dbl(i, j, function(i, j) { neighbours <- input.modified[i:(i + diameter), j:(j + diameter)] ## MAYBE FIXME: are we certain we don't want matrix ## multiplication? %*% ## FIXME: non-conformable arrays ## Caused by error in `map2_dbl()`: ## βΉ In index: 1. ## Caused by error in `subset * weights`: ## ! non-conformable arrays ## matrix(1, 3, 3) * matrix(2, 9, 3) ## Error in matrix(1, 3, 3) * matrix(2, 9, 3) : non-conformable arrays products <- neighbours * weights sum(products) })) %>% as.matrix(byrow = TRUE, nrow = inputMatrixRows, ncol = inputMatrixCols) }
This is an important function, as it represents a large portion of the clock time when a simulation runs.
@ashokkrish, I have resolved the issue with this entirely. The code is much simpler now, because I'm using terra::focal
and not handling the iteration myself now.
At the moment the following error occurs during the call of the linearInterpolationOperator
function, but only when called by [the meat and potatoes simulation function](https://www.merriam-webster.com/dictionary/meat-and-potatoes) (SVEIRD.BayesianDataAssimilation
).
Error in eval(substitute(expr), data, enclos = parent.frame()) :
numeric 'envir' arg not of length one
In addition: Warning messages:
1: In linearInterpolationOperator(layers, healthZoneCoordinates, compartmentsReported) :
Duplicate cell indices in cells vector derived from health zone coordinates.
2: In linearInterpolationOperator(layers, healthZoneCoordinates, compartmentsReported) :
Ignoring NAs generated from healthZoneCoordinates during linearInterpolationOperator generation.
Things I've done to resolve it:
linearInterpolationOperator
function arguments in its call from SVEIRDdebug()
with the functioncompartmentsReported = 1
and compartmentsReported = 2
Error in eval(substitute(expr), data, enclos = parent.frame()) : numeric 'envir' arg not of length one In addition: Warning messages: 1: In linearInterpolationOperator(layers, healthZoneCoordinates, compartmentsReported) : Duplicate cell indices in cells vector derived from health zone coordinates. 2: In linearInterpolationOperator(layers, healthZoneCoordinates, compartmentsReported) : Ignoring NAs generated from healthZoneCoordinates during linearInterpolationOperator generation.
Resolved
I've completed a refactoring of the function. The entire function could run without error a couple commits ago; I am now resolving bugs which cause the function to produce wildly incorrect results.
Given the SVEIRD simulation, with its Bayesian data analysis, is the meat and potatoes of this application, its code needs to be readable, maintainable, and performant.
SVEIRD.BayesianDataAssimilation
)During the above process, the function must be continually validated to produce the same scientific results. A testing framework and unit tests would be ideal, given the wide range of possible inputs, but I'm unsure I have time to integrate that into the development workflow.
Delayed or moved to other issues