codedthinking / Kezdi.jl

Julia package for data manipulation and analysis
https://codedthinking.github.io/Kezdi.jl/
Other
48 stars 3 forks source link

bug: `@egen` and `@collapse` can not summarize over missing values in columns? #190

Open gergelyattilakiss opened 2 months ago

gergelyattilakiss commented 2 months ago

When I use columns that have missing values to calculate something in a groupby it seems to throw error. E.g: image

On the above df if I run

@with df @egen birt_year = minimum(birth_year), by(person_id)

I get

ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer

But if I run

@with df begin
       @mvencode birth_year, mv(99999)
       @egen byear = minimum(birth_year), by(person_id)
end

It throws no error.

korenmiklos commented 2 months ago

The issue is that all values in a group are missing. So when we do skipmissing, the result will be an empty vector. We may need to special case this. I think we need to return missing in all these cases.