Open findanna opened 2 years ago
Can we show the rows with diffs only, and sort them decreasingly by the size of the diff, so that we see the really problematic ones first?
The number of rows with diff seems to differ per day.
On 2022-04-22, we have:
library(dplyr) library(shinyfind) . <- shinyfind::get_data_all() data_all <- .$data_all country_last_update_info <- .$country_last_update_info data_cum_tests <- data_all |> filter(set == "country") |> filter(time == as.Date("2022-04-22")) |> select(name, cum_tests_orig, all_cum_tests) |> filter(cum_tests_orig != all_cum_tests) |> mutate(diff = abs(log(cum_tests_orig) - log(all_cum_tests))) |> arrange(desc(diff)) data_cum_tests #> # A tibble: 66 x 4 #> name cum_tests_orig all_cum_tests diff #> <chr> <dbl> <dbl> <dbl> #> 1 Trinidad & Tobago 233175 693033 1.09 #> 2 El Salvador 1843224 2432752 0.278 #> 3 Belgium 33456470 36618631 0.0903 #> 4 South Sudan 374797 406944 0.0823 #> 5 Thailand 22978475 23754384 0.0332 #> 6 Kazakhstan 11575012 11276018 0.0262 #> 7 Belize 532846 522306 0.0200 #> 8 Spain 62986857 63817211 0.0131 #> 9 Japan 44500652 45075155 0.0128 #> 10 Netherlands 28622957 28947062 0.0113 #> # ... with 56 more rows
Created on 2022-04-27 by the reprex package (v2.0.1)
On 2022-04-25, we have:
library(dplyr) library(shinyfind) . <- shinyfind::get_data_all() data_all <- .$data_all country_last_update_info <- .$country_last_update_info data_cum_tests <- data_all |> filter(set == "country") |> filter(time == as.Date("2022-04-25")) |> select(name, cum_tests_orig, all_cum_tests) |> filter(cum_tests_orig != all_cum_tests) |> mutate(diff = abs(log(cum_tests_orig) - log(all_cum_tests))) |> arrange(desc(diff)) data_cum_tests #> # A tibble: 34 x 4 #> name cum_tests_orig all_cum_tests diff #> <chr> <dbl> <dbl> <dbl> #> 1 El Salvador 1843224 2432752 0.278 #> 2 Belgium 33456470 36675197 0.0919 #> 3 South Sudan 378620 410357 0.0805 #> 4 Kazakhstan 11575012 11276018 0.0262 #> 5 Belize 532846 522306 0.0200 #> 6 Spain 62986857 63817211 0.0131 #> 7 Japan 44832046 45406549 0.0127 #> 8 Netherlands 28622957 28947062 0.0113 #> 9 New Caledonia 42756 42391 0.00857 #> 10 Iran 50708575 50969794 0.00514 #> # ... with 24 more rows
Originally posted by @benubah in https://github.com/finddx/FINDCov19Tracker/issues/28#issuecomment-1110356228
Need to correct all countries with differences before we change the workflow for the cumsum calculation
The number of rows with diff seems to differ per day.
On 2022-04-22, we have:
Created on 2022-04-27 by the reprex package (v2.0.1)
On 2022-04-25, we have:
Created on 2022-04-27 by the reprex package (v2.0.1)
Originally posted by @benubah in https://github.com/finddx/FINDCov19Tracker/issues/28#issuecomment-1110356228