`subset` and `na.action` arguments

danielwilhelm / R-CS-ranks

R package implementing confidence sets for ranks

https://danielwilhelm.github.io/R-CS-ranks/

GNU General Public License v3.0

5 stars 1 forks source link

`subset` and `na.action` arguments #29

Open MrDomani opened 1 year ago

MrDomani commented 1 year ago

lm enables user to supply subset and na.action arguments. The first filters out data based a certain condition, and the second treats NA values. Both (most of time) drop some observations. Now the ranking r function is called before this happens. Which means, that some ECDF values might not be present in the final model matrix. Should we: 1) Raise an error whenever this happens and prompt the user to deal with it himself 2) Try to handle it ourselves (could be difficult) 3) Do nothing (or just raise a warning), because it does not interfere with our theory (I doubt that, but I don't know for sure)

danielwilhelm commented 1 year ago

Removing values of the cdf is not a good idea because it might change the distribution of the ranks. So, it seems better to me to apply the ranking r function after subset and na.action.

Sent from Proton Mail for iOS

On Tue, May 23, 2023 at 15:19, MrDomani @.***(mailto:On Tue, May 23, 2023 at 15:19, MrDomani < wrote:

lm enables user to supply subset and na.action arguments. The first filters out data based a certain condition, and the second treats NA values. Both (most of time) drop some observations. Now the ranking r function is called before this happens. Which means, that some ECDF values might not be present in the final model matrix. Should we:

Raise an error whenever this happens and prompt the user to deal with it himself

Try to handle it ourselves (could be difficult)

Do nothing (or just raise a warning), because it does not interfere with our theory (I doubt that, but I don't know for sure)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

MrDomani commented 1 year ago

Currently an error is thrown if user supplies na.action or subset or an NA value is present anywhere (cause lmby default removes rows containing NA anywhere, and that affects calculation of ranks).

MrDomani commented 1 year ago

Turns out, that it is more complicated, than I expected. As I mentioned, subsetting and handling of NA values occurs after evaluation of ranking function. I do not see an easy, quick way to handle this. Some ways that I see is

a) copy paste a lot of code from lm() (and calculate ranks after model.frame, and remember about handling r() correctly in other places) and, indeed, fit linear model ourselves (not by calling lm) or b) evaluate get_all_vars, subset it and handle NAs (which could? be inferred from model.frame), and supply it as data argument to lm.

On the other hand, those functionalities are far from being critical, and can be done by user without much work (for example with subset function from base R and drop_na from tidyr package.

A lot of work for not so much gain. I would assign this issue a low priority and work on other matters.

danielwilhelm commented 1 year ago

That sounds good to me.

Sent from Proton Mail for iOS

On Fri, Jun 23, 2023 at 11:28, Pawel Morgen @.***(mailto:On Fri, Jun 23, 2023 at 11:28, Pawel Morgen < wrote:

Turns out, that it is more complicated, than I expected. As I mentioned, subsetting and handling of NA values occurs after evaluation of ranking function. I do not see an easy, quick way to handle this. Some ways that I see is

a) copy paste a lot of code from lm() (and calculate ranks after model.frame, and remember about handling r() correctly in other places) and, indeed, fit linear model ourselves (not by calling lm) or b) evaluate get_all_vars, subset it and handle NAs (which could? be inferred from model.frame), and supply it as data argument to lm.

On the other hand, those functionalities are far from being critical, and can be done by user without much work (for example with subset function from base R and drop_na from tidyr package.

A lot of work for not so much gain. I would assign this issue a low priority and work on other matters.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>