anujkhare / iregnet

7 stars 12 forks source link

summary of calls with aditya #60

Open tdhock opened 5 years ago

tdhock commented 5 years ago

16 May 2019 linear regression y_i ~ N( w^T x_i, \sigma^2) log t_i = y_i y_i ~ N( f(x_i), \sigma^2) f(x_i) = w^T x_i f(x_i) = b_0 + b^T x_i uncensored y_i => likelihood is density censored y_i => likelihood is cumulative distribution function

Aditya said that NAN error in iregnet only happens when train labels have no lower limits or no upper limits. in this case the max likelihood model is undefined, so you should stop with an error that tells the user that at least one upper limit and one lower limit are required.

cv.iregnet: you should check to make sure that each train set/split has at least one upper and one lower limit, otherwise stop with an informative error before running the optimization/iregnet. e.g. here is the error message I use in penaltyLearning::IntervalRegressionCV, which tells the user which https://github.com/tdhock/penaltyLearning/blob/master/R/IntervalRegression.R#L189

tdhock commented 5 years ago

tell the user that they can specify manually the vector of fold ID numbers in cv.iregnet

tdhock commented 5 years ago

that is the foldid argument in my PR https://github.com/anujkhare/iregnet/pull/54/files#diff-028243aa200892f66756ed85c2d1ede7R82

theadityasam commented 5 years ago

We need to come up with a time for each week when everyone will be available so that we can discuss on what all has been accomplished and what needs to be done next.

tell the user that they can specify manually the vector of fold ID numbers in cv.iregnet

Okay, noted

anujkhare commented 5 years ago

Wednesday, 22nd May 2019 (Aditya and Anuj)

  1. Introduction and such
  2. What is likelihood?
  3. Need to discuss the optimization branch (Anuj to review), walk through the package code in detail.

Monday, 27th May 2019 (Aditya and Anuj)

  1. Had a long (!) line-by-line code walk-through of iregnet.R, fit_iregnet.cpp
  2. Discussed where (in C++ code) to add the check for the case where the entire data is either left or right censored
  3. Optimization branch: there is a lot of duplicate code in there, we'd need to refactor it before we can merge it - but let's get Toby's view
  4. First, finish the NaN issue and the CV implementation, then come back to optimization.
  5. Need to set up a regular call schedule

To-do's:

  1. (Aditya) create issue and MR for NaN issue
tdhock commented 5 years ago

I agree with @anujkhare about the optimization branch, let's not merge it until the code is clean / easy to maintain / no duplication.

anujkhare commented 5 years ago

Tuesday, 4th June 2019 (Aditya and Anuj)

  1. NaN error persists on some data sets even despite the all-left/right censored checks. Survreg gives a "did not converge" warning on the same data. Aditya to post the sample data/test cases.
  2. Aditya to push test cases and the latest code that he has so that we can review and merge that.
  3. Look at the optimization branch next and get that merged. Aditya and Anuj to work on an estimate for that.
  4. Get on a call with Toby and decide upon the milestones up till June end.