benjamin-allevius / scanstatistics

An R package for space-time anomaly detection using scan statistics.
GNU General Public License v3.0
49 stars 10 forks source link

Why did the function scan_pb_poisson give different results than SatScan? #11

Open Weiren-Wang opened 3 years ago

Weiren-Wang commented 3 years ago

Hi Benjamin, I used the scan_pb_poisson to conduct space-time analysis with my dataset and I found that the results given by scan_pb_poisson and the result given by the software SatScan were quite different. My dataset is a day-frequency disease counts data, range from 2020/12/31 to 2021/4/14.It contains 10 locations with latitude and longitude. For SatScan,here is the settings: [Input] Time precision : Day Coordinates : Lat/Long [Analysis] Type of Analysis : Space-Time Probability Model : Poissson Scan For Area With : High rates And for scan_pb_possion,here is my code : `counts = SZ_counts %>% df_to_matrix(time_col = "time", location_col = "region", value_col = "count") population = SZ_counts %>% df_to_matrix(time_col = "time", location_col = "region", value_col = "population") zones = SZ_geo %>% select(long, lat) %>% as.matrix %>% spDists(x = ., y = ., longlat = TRUE) %>% dist_to_knn(k = 4) %>% knn_zones regions = as.character(SZ_geo$region) result = data.frame() newcounts = counts newpopulation = population poisson_result = scan_pb_poisson(counts = newcounts, zones = zones, population = newpopulation, n_mcsim = 999) topclusters = top_clusters(poisson_result, zones, k = 10, overlapping = FALSE)

                      top_regions = topclusters$zone %>%
                        purrr::map(get_zone, zones = zones) %>%
                        purrr::map(function(x) regions[x])

                      new_top_regions = c()
                      for (j in 1:length(top_regions)) {
                        new_top_regions[j] = paste(top_regions[[j]], collapse = ',')
                      }

                      topclusters$zonename = new_top_regions
                      topclusters$endtime = rownames(population)[53]
                      result = rbind(result, topclusters)`

    For the same dataset, the SaTScan gave following results:
              1.Location IDs included.: 5
                Coordinates / radius..: (22.726017 N, 114.254455 E) / 0 km
                Time frame............: 2021/2/21 to 2021/4/13
                Population............: 2508600
                Number of cases.......: 198
                Expected cases........: 43.90
                Annual cases / 100000.: 55.4
                Observed / expected...: 4.51
                Relative risk.........: 6.85
                Log likelihood ratio..: 174.120739
                P-value...............: < 0.00000000000000001

              2.Location IDs included.: 4, 6, 1
                Coordinates / radius..: (22.754466 N, 113.942560 E) / 22.26 km
                Time frame............: 2021/2/10 to 2021/4/2
                Population............: 6369300
                Number of cases.......: 22
                Expected cases........: 111.46
                Annual cases / 100000.: 2.4
                Observed / expected...: 0.20
                Relative risk.........: 0.16
                Log likelihood ratio..: 63.471627
                P-value...............: < 0.00000000000000001

              3.Location IDs included.: 3, 7, 8
                Coordinates / radius..: (22.528466 N, 114.061547 E) / 12.88 km
                Time frame............: 2021/1/1 to 2021/2/20
                Population............: 4265300
                Number of cases.......: 14
                Expected cases........: 73.21
                Annual cases / 100000.: 2.4
                Observed / expected...: 0.19
                Relative risk.........: 0.17
                Log likelihood ratio..: 40.022434
                P-value...............: 0.000000000000092

       While scan_pb_possion gave following results:
     zone duration           score        relrisk_in  relrisk_out          Gumbel_pvalue      zonename          endtime
       15     104         392.4982441   4.248194    0.2996086         0.0000000             5              2021/2/28
        13     104        329.5112571     3.428604   0.2993484         0.0000000             4,5       2021/2/28

      The 2 zones given by scan_pb_possion were totally different from the 3 clusters given by SaTScan.Why is that?

      In addition, the SatScan only gave one relative risk but scan_pb_possion give two risk:relrisk_in,relrisk_out.How could I match these results?
mairamorenoc commented 6 months ago

Hi,

In SatScan, are you using the prospective space-time model? Because as far as I understand, the scan_pb_possion function implemented in this package is only for prospective analysis, as he referenced Kulldorff 2001.