aldro61 / mmit

Regression trees for interval censored output data
https://aldro61.github.io/mmit/
GNU General Public License v3.0
7 stars 7 forks source link

Infinity splits #12

Closed aldro61 closed 7 years ago

aldro61 commented 7 years ago

Changed the predicted value for the case where the function only contains one type of bound.

Also updated the unit tests to account for this change.

tdhock commented 7 years ago

another thing I was thinking -- in many data sets, some input features are monotonic transforms of others, e.g. first column is x, second columns is log(x). Do you think that would confuse the learning algo? Do you think we should add a pre-processing step that filters any features that are monotonic transforms of other features? we could do that by comparing the sort order indices of the features...

tdhock commented 7 years ago

I would suggest to avoid adding arbitrary constants (like 1e-4) wherever possible. Is it really necessary? When there are all lower bounds, then the value of the cost function at the last breakpoint is zero, right? In fact all predictions in [b, \infty) result in zero cost, right? (where b is the last breakpoint)

aldro61 commented 7 years ago

Yes that’s right. However, as per our definition, b is not part of the minimum segment, which is ]b, \infty). I added an arbitrary constant because any value > b is in the minimum segment and leads to zero cost. I know that arbitrary constants are ugly, but at least this makes the code consistant with the theory. On the other hand, maybe we could make an exception to the theory, since we are guaranteed that “b” leads to zero cost. What’s your take on this: stick to the theory or exception?

tdhock commented 7 years ago

You are right that b is not part of the minimum segment, but we do know that b is also a minimum. I would recommend to delete the constant, and just use the last breakpoint. It won't make any difference in terms of learning, but I think it will clarify the code (it is always better to avoid introducing arbitrary constants in my opinion).

Another option is to go back to reporting Infty as the minimum (rather than b).

Another option is to report both the lower and upper bounds of optimal solutions, here is would be the two double precision numbers b and INFINITY.

On Tue, May 30, 2017 at 1:03 PM, Alexandre Drouin notifications@github.com wrote:

Yes that’s right. However, as per our definition, b is not part of the minimum segment, which is ]b, \infty). I added an arbitrary constant because any value > b is in the minimum segment and leads to zero cost. I know that arbitrary constants are ugly, but at least this makes the code consistant with the theory. On the other hand, maybe we could make an exception to the theory, since we are guaranteed that “b” leads to zero cost. What’s your take on this: stick to the theory or exception?

On May 30, 2017 at 12:48:58 PM, Toby Dylan Hocking ( notifications@github.com) wrote:

I would suggest to avoid adding arbitrary constants (like 1e-4) wherever possible. Is it really necessary? When there are all lower bounds, then the value of the cost function at the last breakpoint is zero, right? In fact all predictions in [b, \infty) result in zero cost, right? (where b is the last breakpoint)

On Tue, May 30, 2017 at 12:40 PM, Alexandre Drouin < notifications@github.com

wrote:

@aldro61 commented on this pull request.

In mmit/core/piecewise_function.cpp https://github.com/aldro61/mmit/pull/12#discussion_r119152240:

@@ -101,11 +101,7 @@ double PiecewiseFunction::get_minimum_position() { else if(equal(this->min_coefficients.quadratic, 0) && equal(this->min_coefficients.linear, 0)){ if(is_end(this->min_ptr)){ // Case: ___x lower bounds only

  • /* TODO: I don't think this is correct. The position of the previous breakpoint is not in the min
    • segment. So by returning that value, we are not returning a true function minimum. We should return
    • that value + some offset.
    • */
  • return get_breakpoint_position(std::prev(this->min_ptr));
  • return get_breakpoint_position(std::prev(this->min_ptr)) + 1e-4; // Add a small value because the previous breakpoint is not included in the minimum segment ]lower, upper]

@tdhock https://github.com/tdhock I made this change because b{t, i-1} is not included in the minimum segment, which is defined over the range ]b{t, i-1}, b_{t, i}]. Should not change much in practice, but at least it's consistent. I updated the unit tests in R and Python.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aldro61/mmit/pull/12#pullrequestreview-41006193, or mute the thread < https://github.com/notifications/unsubscribe-auth/ AA478urE4OibZruTTk56l4a4KjumANNMks5r_EaIgaJpZM4NA1-M

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/aldro61/mmit/pull/12#issuecomment-304939068, or mute the thread https://github.com/notifications/unsubscribe-auth/ ACQ9RKNTIZ6dgpxqFPMkBfVwjwtBY-RMks5r_Eh3gaJpZM4NA1-M

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aldro61/mmit/pull/12#issuecomment-304943502, or mute the thread https://github.com/notifications/unsubscribe-auth/AA478qea7dbqGRJrJf7glAmqN_geJejYks5r_EvKgaJpZM4NA1-M .

aldro61 commented 7 years ago

Thanks for your opinion! I’ll go with option 1 and roll back to what we were doing before: return “b” as the predicted value. It’s the simplest solution and we do know that it is a minimum.

aldro61 commented 7 years ago

My code review is done and I'm ready to merge into Master. Ok for you @tdhock?

tdhock commented 7 years ago

great, please merge +1