choonghyunryu / dlookr

Tools for Data Diagnosis, Exploration, Transformation
https://choonghyunryu.github.io/dlookr/
208 stars 35 forks source link

add group_by() functionality #90

Closed jyk closed 1 year ago

jyk commented 1 year ago

Hi, maybe I am missing something, but I can not use currently group_by() and then diagnose() etc. Is it possible please to add this functionality ("diagnose for groups") ?

choonghyunryu commented 1 year ago

@jyk, Thank you for your suggestions.

Currently diagnose() does not support group_by().

describe(), correlate(), and normality() for EDA support group_by, but operations in the Data Diagnosis task do not support group_by(). This is because it is designed for the purpose of diagnosing data when you first encounter it.

By the way, I think your attempt is meaningful. This is because the quality of the data can be problematic in certain categories.

I will modify diagnose() to work with group_by(). just give me some time

jyk commented 1 year ago

Thanks. Yes, I would like to use group_by() for different time periods (like in weekly and monthly basis) in order to track the data quality issues in time during scoring of the models

choonghyunryu commented 1 year ago

@jyk,

I implemented your suggestions on github development version 0.6.2.9000.

Functions that support group_by() are as follows.:

jyk commented 1 year ago

Thank You very much! Now I am fine.

choonghyunryu commented 1 year ago

Dear Lorenzo Fabbri,

Pass arguments to the group_by() function with the across() function, as in the following example.

grouping_var <- "death_event" dlookr::heartfailure |>
group_by(across(all_of(grouping_var))) |>
dlookr::diagnose_category()

Regards, choonghyun

-----Original Message----- From: "Lorenzo @.> To: @.>; Cc: "Choonghyun @.>; @.>; Sent: 2023-06-03 (토) 00:09:47 (GMT+09:00) Subject: Re: [choonghyunryu/dlookr] add group_by() functionality (Issue #90)

I wrote a function which takes as input a string (in this specific case "cohort") representing a factor to pass to dplyr::group_by, which is called before diagnose_category: dlookr::diagnose_category(dat |> dplyr::group_by({{ grouping_var }})) It does not produce any error but in the resulting tibble, the column cohort contains only the value cohort, rather than its levels. I guess it is related to the use of {{, but I have not found a solution. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.***>