benjamin-allevius / scanstatistics

An R package for space-time anomaly detection using scan statistics.
GNU General Public License v3.0
49 stars 10 forks source link

Index out of bounds in Scan statistics #3

Closed LusiXie closed 6 years ago

LusiXie commented 6 years ago

Dear Ben,

I'm pretty new to scanstatistics. When I was trying to run functions in scan statistics, I got this error: error: Mat::elem(): index out of bounds Error in (function (counts, baselines, zones, zone_lengths, store_everything, : Mat::elem(): index out of bounds

Could you please help me out?

Thanks.

Best, Lusi

ghost commented 6 years ago

Sure! Which function did you call from R? Is it possible for you to share the code which produced the error?

LusiXie commented 6 years ago

Sure. So I was trying to use the code you posted in the example but with my own data. I have data on population, counts, latitude and longitude of the center of each area (39 in total) across 12 years. Sorry I am very new to spatial scan statistics and the package, so I might make very stupid mistakes. Here is the code:

counts <- disease %>% df_to_matrix(time_col = "year", location_col = "wmu", value_col = "count") population<-disease %>% df_to_matrix(time_col = "year", location_col = "wmu", value_col = "submission")

zones <- wmu_geo %>% select(longitude, latitude) %>% as.matrix %>% spDists(x = ., y = ., longlat = TRUE) %>% dist_to_knn(k = 15) %>% #I've tried different k's knn_zones

mod <- glm(count ~ offset(population/100) + 1 + I(year), family = poisson(link = "log"), data = disease)

ebp_baselines <- disease %>% mutate(mu = predict(mod, newdata = ., type = "response")) %>% df_to_matrix(value_col = "mu")

set.seed(1) poisson_result <- scan_eb_poisson(counts = counts, zones = zones, population = population,

baselines = ebp_baselines, I've tried with or without baselines setting

                              n_mcsim = 10)

And the error message is: error: Mat::elem(): index out of bounds Error in (function (counts, baselines, zones, zone_lengths, store_everything, : Mat::elem(): index out of bounds

Thanks!

ghost commented 6 years ago

Hmm, it is hard to replicate the without data. Would it be possible for you to send me your data (email to benjak@math.su.se)? I will not share it with anyone else.

ghost commented 6 years ago

I think the issue was that the ebp_baselines matrix had time on the column dimension and locations ("wmu") on rows; it should be the opposite. Also, when the glm was fitted I'm note sure that the disease data frame actually had a column called population, so it could be that it used the matrix variable population instead. Regardless, the following code seems to run fine for me:

library(tidyverse)
library(magrittr)
library(scanstatistics)
library(sp)

disease <- read_csv("disease.csv")

# Check that the number of locations ("wmu") is the same in every year
disease %>% 
  group_by(year) %>% 
  summarize(n_locs = length(unique(wmu))) %>% 
  pull(n_locs) %>% 
  unique %>% 
  length == 1

# Locations should be numbered sequentially from 1 and up:
disease %<>%
  arrange(year, wmu) %>%
  group_by(year) %>%
  mutate(location = order(wmu)) %>%
  ungroup

# Add a column called population, equal to submission
disease %<>%
  mutate(population = submission)

counts <- disease %>%
  df_to_matrix(time_col = "year", location_col = "wmu", value_col = "count")
population <- disease %>%
  df_to_matrix(time_col = "year", location_col = "wmu", value_col = "submission")

wmu_geo <- disease %>% 
  filter(year == 2005) %>%
  select(wmu, location, longitude, latitude)

zones <- wmu_geo %>%
  select(longitude, latitude) %>%
  as.matrix %>%
  spDists(x = ., y = ., longlat = TRUE) %>%
  dist_to_knn(k = 15) %>% #I've tried different k's
  knn_zones

mod <- glm(count ~ offset(population/100) + 1 + I(year),
           family = poisson(link = "log"),
           data = disease)

# NOTE: here the original code gave a matrix that had times on columns and
#       locations on rows. It should be the other way around, so I transpose
ebp_baselines <- disease %>%
  mutate(mu = predict(mod, newdata = ., type = "response")) %>%
  df_to_matrix(value_col = "mu") %>%
  t

set.seed(1)
poisson_result <- scan_eb_poisson(counts = counts,
                                  zones = zones,
                                  population = population,
                                  baselines = ebp_baselines,
                                  n_mcsim = 10)
LusiXie commented 6 years ago

Thanks Benjamin! It worked for me!

avallecam commented 3 years ago

For the record, I also got the same error with scan_eb_poisson after using an NM_popcas and NM_geo with different number of spatial units/counties/districts :')

Now I use these lines to verify that all the objects have the same dimensions previous to this step:

# must have the same dimensions
counts %>% dim()
ebp_baselines %>% dim()
NM_geo %>% count()
zones