gertjanssenswillen / edeaR

!! repository moved to https://github.com/bupaverse/edeaR !! This repo is read-only from now one.
Other
7 stars 10 forks source link

throughput_time with strange output. Bug? #22

Closed frkbr closed 5 years ago

frkbr commented 5 years ago

The throughput_time function shows some strange behaviour. I would assume that the following three code examples should produce the same output. However, all resulting quartiles and the mean are very different, except the Min. and the Max.

sepsis %>% throughput_time(level = "case") %>% summary()

sepsis %>% throughput_time(level = "case", append = TRUE) %>% 
    select(throughput_time_case, force_df = TRUE) %>% summary()

sepsis %>% throughput_time(level = "log") %>% summary()

Could there be a bug or do I misunderstand the function?

frkbr commented 5 years ago

When I use my own data and validate the result using dplyr only, it seems that the first example above produces the correct result.

frkbr commented 5 years ago

OK, I found out, that I misunderstood the second example. The summary() command doesn't make sense in the second example, since the throughput_time gets appended on the event level. Naturally the resulting quartiles must be different from the first example.

I guess I don't understand the third example. What does the argument level = "log" actually produce? Thanks for any clarification.

gertjanssenswillen commented 5 years ago

The output of

sepsis %>% throughput_time(level = "log")

is itself already a summary, which is equivalent to

sepsis %>% throughput_time(level = "case") %>% summary()

The level = "log" typically provides a summary of the log. Taking the summary of it

sepsis %>% throughput_time(level = "log") %>% summary()

would thus be meaningless. The mean you get is the mean of the summary values, etc.

Hopes this clears it up?

frkbr commented 5 years ago

Thanks, that made it clear!