Open MrDomani opened 1 year ago
Removing values of the cdf is not a good idea because it might change the distribution of the ranks. So, it seems better to me to apply the ranking r function after subset and na.action.
Sent from Proton Mail for iOS
On Tue, May 23, 2023 at 15:19, MrDomani @.***(mailto:On Tue, May 23, 2023 at 15:19, MrDomani < wrote:
lm enables user to supply subset and na.action arguments. The first filters out data based a certain condition, and the second treats NA values. Both (most of time) drop some observations. Now the ranking r function is called before this happens. Which means, that some ECDF values might not be present in the final model matrix. Should we:
- Raise an error whenever this happens and prompt the user to deal with it himself
- Try to handle it ourselves (could be difficult)
- Do nothing (or just raise a warning), because it does not interfere with our theory (I doubt that, but I don't know for sure)
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Currently an error is thrown if user supplies na.action
or subset
or an NA value is present anywhere (cause lm
by default removes rows containing NA anywhere, and that affects calculation of ranks).
Turns out, that it is more complicated, than I expected. As I mentioned, subsetting and handling of NA values occurs after evaluation of ranking function. I do not see an easy, quick way to handle this. Some ways that I see is
a) copy paste a lot of code from lm()
(and calculate ranks after model.frame
, and remember about handling r()
correctly in other places) and, indeed, fit linear model ourselves (not by calling lm
) or
b) evaluate get_all_vars
, subset it and handle NAs (which could? be inferred from model.frame
), and supply it as data
argument to lm
.
On the other hand, those functionalities are far from being critical, and can be done by user without much work (for example with subset
function from base R and drop_na
from tidyr
package.
A lot of work for not so much gain. I would assign this issue a low priority and work on other matters.
That sounds good to me.
Sent from Proton Mail for iOS
On Fri, Jun 23, 2023 at 11:28, Pawel Morgen @.***(mailto:On Fri, Jun 23, 2023 at 11:28, Pawel Morgen < wrote:
Turns out, that it is more complicated, than I expected. As I mentioned, subsetting and handling of NA values occurs after evaluation of ranking function. I do not see an easy, quick way to handle this. Some ways that I see is
a) copy paste a lot of code from lm() (and calculate ranks after model.frame, and remember about handling r() correctly in other places) and, indeed, fit linear model ourselves (not by calling lm) or b) evaluate get_all_vars, subset it and handle NAs (which could? be inferred from model.frame), and supply it as data argument to lm.
On the other hand, those functionalities are far from being critical, and can be done by user without much work (for example with subset function from base R and drop_na from tidyr package.
A lot of work for not so much gain. I would assign this issue a low priority and work on other matters.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
lm
enables user to supplysubset
andna.action
arguments. The first filters out data based a certain condition, and the second treats NA values. Both (most of time) drop some observations. Now the rankingr
function is called before this happens. Which means, that some ECDF values might not be present in the final model matrix. Should we: 1) Raise an error whenever this happens and prompt the user to deal with it himself 2) Try to handle it ourselves (could be difficult) 3) Do nothing (or just raise a warning), because it does not interfere with our theory (I doubt that, but I don't know for sure)