finddx / FINDCov19Tracker

https://dsbbfinddx.github.io/FINDCov19Tracker/
Other
0 stars 1 forks source link

Countries with diffs in cum_tests_orig and all_cum_tests #30

Open findanna opened 2 years ago

findanna commented 2 years ago

Can we show the rows with diffs only, and sort them decreasingly by the size of the diff, so that we see the really problematic ones first?

The number of rows with diff seems to differ per day.

On 2022-04-22, we have:

library(dplyr)
library(shinyfind)

. <- shinyfind::get_data_all()
data_all <- .$data_all
country_last_update_info <- .$country_last_update_info

data_cum_tests <-  
  data_all |>
  filter(set == "country") |>
  filter(time == as.Date("2022-04-22")) |>
  select(name, cum_tests_orig, all_cum_tests) |>
  filter(cum_tests_orig != all_cum_tests) |>
  mutate(diff = abs(log(cum_tests_orig) - log(all_cum_tests))) |>
  arrange(desc(diff))

data_cum_tests
#> # A tibble: 66 x 4
#>    name              cum_tests_orig all_cum_tests   diff
#>    <chr>                      <dbl>         <dbl>  <dbl>
#>  1 Trinidad & Tobago         233175        693033 1.09  
#>  2 El Salvador              1843224       2432752 0.278 
#>  3 Belgium                 33456470      36618631 0.0903
#>  4 South Sudan               374797        406944 0.0823
#>  5 Thailand                22978475      23754384 0.0332
#>  6 Kazakhstan              11575012      11276018 0.0262
#>  7 Belize                    532846        522306 0.0200
#>  8 Spain                   62986857      63817211 0.0131
#>  9 Japan                   44500652      45075155 0.0128
#> 10 Netherlands             28622957      28947062 0.0113
#> # ... with 56 more rows

Created on 2022-04-27 by the reprex package (v2.0.1)

On 2022-04-25, we have:

library(dplyr)
library(shinyfind)

. <- shinyfind::get_data_all()
data_all <- .$data_all
country_last_update_info <- .$country_last_update_info

data_cum_tests <-  
  data_all |>
  filter(set == "country") |>
  filter(time == as.Date("2022-04-25")) |>
  select(name, cum_tests_orig, all_cum_tests) |>
  filter(cum_tests_orig != all_cum_tests) |>
  mutate(diff = abs(log(cum_tests_orig) - log(all_cum_tests))) |>
  arrange(desc(diff))

data_cum_tests
#> # A tibble: 34 x 4
#>    name          cum_tests_orig all_cum_tests    diff
#>    <chr>                  <dbl>         <dbl>   <dbl>
#>  1 El Salvador          1843224       2432752 0.278  
#>  2 Belgium             33456470      36675197 0.0919 
#>  3 South Sudan           378620        410357 0.0805 
#>  4 Kazakhstan          11575012      11276018 0.0262 
#>  5 Belize                532846        522306 0.0200 
#>  6 Spain               62986857      63817211 0.0131 
#>  7 Japan               44832046      45406549 0.0127 
#>  8 Netherlands         28622957      28947062 0.0113 
#>  9 New Caledonia          42756         42391 0.00857
#> 10 Iran                50708575      50969794 0.00514
#> # ... with 24 more rows

Created on 2022-04-27 by the reprex package (v2.0.1)

Originally posted by @benubah in https://github.com/finddx/FINDCov19Tracker/issues/28#issuecomment-1110356228

findanna commented 2 years ago

Need to correct all countries with differences before we change the workflow for the cumsum calculation