anujkhare / iregnet

7 stars 12 forks source link

NAN Error #62

Closed theadityasam closed 5 years ago

theadityasam commented 5 years ago

So currently I believe the NAN issue occurs when the data is either completely right censored or left truncated. I'll be introducing an error message in the get_censoring_types function of iregnet_fit.cpp whenever this condition is met and then will be testing it in various cases to see if the NAN error still occurs. If it does then I need to discuss if we have missed any other cause for the error.

theadityasam commented 5 years ago

Hii @anujkhare @tdhock I have written the error message and it's working great for the main iregnet function. For cv, as I had previously explained, in case we end up with a fold which produces the NAN error, we need to shuffle the folds again. Today i tried out many methods like counting the total number of left, right and interval censored and then splitting it into the folds accordingly but this will seriously cost the runtime of the function. Can we plan for this over a call whenever you are free?

tdhock commented 5 years ago

hi @theadityasam it would be great to have discussions like this one in the context of a PR where you can make references to specific commits / lines of code

anyway that is great that you got an error message working... can you please add a unit test using expect_error?

I think it would be better to stop with an error in R code before calling the C++ routine.

same thing for the cv function ... please do NOT re-shuffle folds if there is one that has either all left censored outputs or all right censored outputs. in that case please just stop with an informative error that tells the user they need to manually specify the foldid argument. here is some similar code in one of my other packages, https://github.com/tdhock/penaltyLearning/blob/master/R/IntervalRegression.R#L189

theadityasam commented 5 years ago

Hiii Toby, I created a new branch called nan wherein first I'll fix the nan error and test it for various dataset https://github.com/theadityasam/iregnet/tree/nan I had made a pretty simple change to fix the issue, this is temporary and will make the change to the R code rather than the C++ code as you said - Link. I'll include the unit test soon. As for cv function, I'll include the error message as you said, but is there any problem in reshuffling to try and fit the fold?

tdhock commented 5 years ago

great to see that you are comitting to the nan branch -- please open a PR for that and let's move the discussion there.

yes I think there is a problem in reshuffling in the cv function. to me if the first random fold assignment does not work then there may be a problem with the data, and we should inform the user. do you agree @anujkhare ?

theadityasam commented 5 years ago

@tdhock @anujkhare I've create a pull request #63 We can move the conversation over there

tdhock commented 5 years ago

great