Closed ymer closed 10 months ago
We will add a replace_missing()
function that behaves similarly to the tidyverse replace_na()
function. The reason the name will be slightly different in Tidier.jl is that the word NA
has no special meaning in Julia, and the keyword missing
does.
The @fill_missing()
macro is equivalent to the tidyverse fill_na()
function.
Hi @ymer , here is an initial implementation of a replace_missing()
macro that works. It wraps the mutate
macro, but has slightly different syntax than the tidyverse replace_na()
. Until i can sort out the syntax difference and it becomes available, please feel free to use this in your work
macro replace_missing(df, kwargs...)
expressions = []
for kwarg in kwargs
if kwarg.head == :(=)
key = kwarg.args[1]
value = kwarg.args[2]
push!(expressions, :($(key) = coalesce($(key), $value)))
else
throw(ArgumentError("Invalid argument: $kwarg"))
end
end
return quote
@mutate($(esc(df)), $(expressions...))
end
end
if you had a df with different columns such as a, b and c you could use it as follows. where the left side of the = is the column and the right side is what to replace missing with.
@replace_missing(df, a = 0, b = 2, c = "wow")
Thanks @drizk1. In this case, I believe replace_missing()
should be a function rather than a macro since it works with vectors rather than data frames.
Oh ok. In that case, below is the adjusted implementation that now matches the tidy syntax.
function replace_missing(vec, replacement)
return map(x -> ismissing(x) ? replacement : x, vec)
end
@chain df begin
@mutate(a = ~replace_missing(a, 0), b = ~replace_missing(b, 2), c = ~replace_missing(c, 'w'))
end
I might even simplify it further to ismissing(x) ? replacement : x
and let it get vectorized to make it work on vectors.
Wow. This might be one of the shortest functions i will ever write
replace_missing(x, replacement) = ismissing(x) ? replacement : x
There is also the reverse function missing_if
(na_if
in tidyverse).
missing_if(x, value) = x == value ? missing : x
@mutate(df, i = missing_if(i, "N/A"))
Love it! We will get these added soon.
replace_missing()
, @fill_missing()
, and missing_if()
are all implemented.
In Tidyverse I can replace NA values in this way:
mutate(distance = replace_na(distance, 0))
in Tidier, it seems that I should do it like this:
@mutate(distance = if_else(ismissing(distance), 0, distance))
Not crucial, but it is a functionality that is used often. It could be called
replace_missing
.Or possibly it could be called with the already existing function
fill_missing(distance, 0)
. Tidyverse doesn't do it like that though.