epiverse-trace / epichains

[Under active development] Methods for simulating and analysing the sizes and lengths of infectious disease transmission chains from branching process models
https://epiverse-trace.github.io/epichains/
Other
5 stars 2 forks source link

Add ability to convert a data.frame back into an epichains object #277

Open bquilty25 opened 2 weeks ago

bquilty25 commented 2 weeks ago

I'm doing an analysis where I use simulate_chains() to generate an epidemic, which is then "pruned" afterwards to simulate the effect of testing and isolation.

This involves using as.data.frame() on the resulting epichains object, applying a set of functions, and then summarising the results using the tidyverse. However it would be nice if there were the ability to convert the data.frame back into an epichains object, so that it can be summarised using functions within the package as well as be interoperable with epicontacts for plotting, if possible.

jamesmbaazam commented 2 weeks ago

Thanks for raising this @bquilty25. This is indeed a nice workflow to have. I will address this issue in the coming week and point you to how to access the new feature.

sbfnk commented 1 week ago

Are you sue you need to call as.data.frame()? The epichains objects are data frames, too, so I would have thought it's possible to use tidyverse functions directly (I think).

jamesmbaazam commented 1 week ago

Here's my initial assessment.

library(epichains)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(32)
chains_pois_offspring <- simulate_chains(
    n_chains = 10,
    statistic = "size",
    offspring_dist = rpois,
    stat_threshold = 10,
    generation_time = function(n) rep(3, n),
    lambda = 2
)

# Print fresh object
chains_pois_offspring
#> `<epichains>` object
#> 
#> < epichains head (from first known infector) >
#> 
#>    chain infector infectee generation time
#> 11     1        1        2          2    3
#> 12     1        1        3          2    3
#> 13     2        1        2          2    3
#> 14     2        1        3          2    3
#> 15     3        1        2          2    3
#> 16     3        1        3          2    3
#> 
#> 
#> Number of chains: 10
#> Number of infectors (known): 9
#> Number of generations: 4
#> Use `as.data.frame(<object_name>)` to view the full output in the console.
# attributes of the fresh object
attributes(chains_pois_offspring)
#> $names
#> [1] "chain"      "infector"   "infectee"   "generation" "time"      
#> 
#> $class
#> [1] "epichains"  "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129
#> 
#> $n_chains
#> [1] 10
#> 
#> $statistic
#> [1] "size"
#> 
#> $offspring_dist
#> function (n, lambda) 
#> .Call(C_rpois, n, lambda)
#> <bytecode: 0x1113f18e8>
#> <environment: namespace:stats>
#> 
#> $stat_threshold
#> [1] 10
#> 
#> $track_pop
#> [1] FALSE

# change it to a data.frame and inspect the attributes
chains_df <- as.data.frame(chains_pois_offspring)
attributes(chains_df)
#> $names
#> [1] "chain"      "infector"   "infectee"   "generation" "time"      
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129
#> 
#> $n_chains
#> [1] 10
#> 
#> $statistic
#> [1] "size"
#> 
#> $offspring_dist
#> function (n, lambda) 
#> .Call(C_rpois, n, lambda)
#> <bytecode: 0x1113f18e8>
#> <environment: namespace:stats>
#> 
#> $stat_threshold
#> [1] 10
#> 
#> $track_pop
#> [1] FALSE

# which attributes are lost?
setdiff(names(attributes(chains_pois_offspring)), names(attributes(chains_df)))
#> character(0)

## Use original epichains object in a tidyverse pipeline and remove a protected column
obj_modified <- chains_pois_offspring |> 
    mutate(isolated = ifelse(generation > 2, TRUE, FALSE)) |> 
    select(-c(generation, infector))

# what is the class?
class(obj_modified)
#> [1] "epichains"  "data.frame"

# Which attributes are lost?
setdiff(names(attributes(chains_pois_offspring)), names(attributes(obj_modified)))
#> [1] "n_chains"       "statistic"      "offspring_dist" "stat_threshold"
#> [5] "track_pop"

# Can we print the modified object?
try(obj_modified)
#> Error in .validate_epichains(x): object does not contain the correct columns

Created on 2024-09-03 with reprex v2.1.1