gertjanssenswillen / edeaR

!! repository moved to https://github.com/bupaverse/edeaR !! This repo is read-only from now one.
Other
7 stars 10 forks source link

Be consistent about row ordering for resource metrics #20

Closed richierocks closed 5 years ago

richierocks commented 5 years ago

As an example of inconsistency, number_of_repetitions() returns values ordered by resource whereas resource_frequency() returns values ordered by count.

library(edeaR)
data(sepsis, package = "eventdataR")
number_of_repetitions(sepsis, level = "resource")
Using default type: all
## # resource_metric [26 × 3]
## first_resource absolute relative
## <fct>             <dbl>    <dbl>
##   1 ?                     0  0      
## 2 A                     0  0      
## 3 B                  1536  0.189  
## 4 C                     3  0.00285
## 5 D                     0  0      
## 6 E                     0  0      
## 7 F                    16  0.0741 
## 8 G                    67  0.453  
## 9 H                     6  0.109  
## 10 I                    12  0.0952 
resource_frequency(sepsis, level = "resource")
# A tibble: 26 x 3
## resource absolute relative
## <fct>       <int>    <dbl>
##   1 B            8111  0.533  
## 2 A            3462  0.228  
## 3 C            1053  0.0692 
## 4 E             782  0.0514 
## 5 ?             294  0.0193 
## 6 F             216  0.0142 
## 7 L             213  0.0140 
## 8 O             186  0.0122 
## 9 G             148  0.00973
## 10 I             126  0.00828
## # ... with 16 more rows

I think it make sense to have every resource metric return values in the same order. You could take the approach of dplyr::count() and have a sort argument that determines whether or not to sort the rows by count.

gertjanssenswillen commented 5 years ago

Sort argument added for all metrics. Defaults to TRUE