Closed prockenschaub closed 2 years ago
Sure, sounds like a reasonable suggestion. There are a couple of CI failures, unrelated to your PR, I'll have to check out. This PR is still WIP? From the quick look I had, it doesn't seem this gives you the desired behavior, right?
I had added a call to your function rm_na_val_var
on line 314. As far as I understand it, this first removes all NA
s and then also prints a message if any where removed. Then the boundaries are applied to this new x
without missing data and another message is generated as per your original code if any rows were removed this way. Both message express the number of removed rows as a percentage of the original number of rows, now renamed to n_total
.
This isn't the most beautiful solution and was mostly a proof of concept, can see if I can refactor it.
Problem
ricu
currently provides a joint warning when out-of-range values orNA
are removed during loading of a concept. For example, when loadingtemp
(body temperature) in the eICU demo we obtain the following message:The current behaviour makes it hard to judge whether the rows were primarily removed because of missing values or because of out-of-range values. While missing values may indicate a sparsely filled matrix (as is the case here), many out-of-range values may instead suggest unit conversion errors. Currently, one needs to delve into concept definitions and source tables to identify which ones the cause.
Proposed solution
Separating these cases into distinct messages may make it clearer on a first glance why rows were excluded. While this adds an extra row to the output of a (verbose)
load_concept
call, I believe it would be worth it for clarity. The result for the above example is:Note
This currently merges into
main
, as it seems to be the most up-to-date. Please let me know if you'd like me to merge intodev
or another branch instead.