TidierOrg / TidierData.jl

Tidier data transformations in Julia, modeled after the dplyr/tidyr R packages.
MIT License
86 stars 7 forks source link

replace_na function from tidyverse #37

Closed ymer closed 10 months ago

ymer commented 1 year ago

In Tidyverse I can replace NA values in this way: mutate(distance = replace_na(distance, 0))

in Tidier, it seems that I should do it like this: @mutate(distance = if_else(ismissing(distance), 0, distance))

Not crucial, but it is a functionality that is used often. It could be called replace_missing.

Or possibly it could be called with the already existing function fill_missing(distance, 0). Tidyverse doesn't do it like that though.

kdpsingh commented 1 year ago

We will add a replace_missing() function that behaves similarly to the tidyverse replace_na() function. The reason the name will be slightly different in Tidier.jl is that the word NA has no special meaning in Julia, and the keyword missing does.

The @fill_missing() macro is equivalent to the tidyverse fill_na() function.

drizk1 commented 1 year ago

Hi @ymer , here is an initial implementation of a replace_missing() macro that works. It wraps the mutate macro, but has slightly different syntax than the tidyverse replace_na(). Until i can sort out the syntax difference and it becomes available, please feel free to use this in your work

macro replace_missing(df, kwargs...)
    expressions = []
    for kwarg in kwargs
        if kwarg.head == :(=)  
            key = kwarg.args[1]
            value = kwarg.args[2]
            push!(expressions, :($(key) = coalesce($(key), $value)))
        else
            throw(ArgumentError("Invalid argument: $kwarg"))
        end
    end
    return quote
        @mutate($(esc(df)), $(expressions...))
    end
end

if you had a df with different columns such as a, b and c you could use it as follows. where the left side of the = is the column and the right side is what to replace missing with.

@replace_missing(df, a = 0, b = 2, c = "wow")
kdpsingh commented 1 year ago

Thanks @drizk1. In this case, I believe replace_missing() should be a function rather than a macro since it works with vectors rather than data frames.

drizk1 commented 1 year ago

Oh ok. In that case, below is the adjusted implementation that now matches the tidy syntax.

function replace_missing(vec, replacement)
    return map(x -> ismissing(x) ? replacement : x, vec)
end

@chain df begin
    @mutate(a = ~replace_missing(a, 0), b = ~replace_missing(b, 2), c = ~replace_missing(c, 'w'))
end
kdpsingh commented 1 year ago

I might even simplify it further to ismissing(x) ? replacement : x and let it get vectorized to make it work on vectors.

drizk1 commented 1 year ago

Wow. This might be one of the shortest functions i will ever write

replace_missing(x, replacement) = ismissing(x) ? replacement : x
ymer commented 1 year ago

There is also the reverse function missing_if (na_if in tidyverse).

missing_if(x, value) = x == value ? missing : x

@mutate(df, i = missing_if(i, "N/A"))
kdpsingh commented 1 year ago

Love it! We will get these added soon.

kdpsingh commented 10 months ago

replace_missing(), @fill_missing(), and missing_if() are all implemented.