Add functionality for early stopping rounds.

david-sun-1 commented 9 months ago

Had a need for early stopping rounds and followed the discussion found here https://github.com/dmlc/XGBoost.jl/issues/158 to implement this version of it.

ExpandingMan commented 9 months ago

Thanks for this.

I have a few concerns with this implementation. The first is that, we had thus far carefully avoided parsing the output messages for obvious reasons. There is apparently no way around the fact that having this kind of functionality requires it, so I don't think we could disallow it, however, I think that if we are going to do this we should (at the very least) have dedicated log message parsing functions, and expose that useful functionality in the API rather than cramming it into update!. Ideally, we probably should provide the ability to insert arbitrary callbacks between calls to updateone!. That doesn't have to come in this PR, but we should think carefully about what the interface should look like for it to see if we can do it without breaking (I have not thought it through carefully yet myself). I also don't love the idea of removing the evaluations from the calls to updateone! itself, as it could be useful if someone wants to write a custom loop that uses it, though I can certainly see a good argument that it's getting in the way here. Lastly, shouldn't we have some kind of tolerance for the score comparison? It would be really nice to be able to do some kind of simple convergence check and stop when that is reached. In most of my cases I'd be far more likely to use this to do that and I think that is a good use case.

david-sun-1 commented 9 months ago

Hi @ExpandingMan , thanks for the feedback.

I've added a function to do the parsing
Agree with you on the callback, ( for future! ). This would make it a much cleaner implementation
I've added back evaluations to updateone! In exchange, I've had to return msg from the "updateone!" function and separate msg assignment and @info into 2 separate lines.

For the tolerance on score comparison -that it is extra functionality beyond what is currently available. Early stopping criteria within the R & Python implementations are both based on number of rounds where no improvement has been made and this is how convergence is tested for.

This is generally fine so long as you have an evaluation set which would stop overfitting. In the event that we may be checking early stopping against a training set only, it would take much longer to reach a point where there is no improvement. I think what you mentioned is a good suggestion which would be valuable where the only other alternative would be to define the number of rounds upfront.

I suggest we tackle that particular enhancement in a separate PR.

ExpandingMan commented 9 months ago

Thanks. I have to take a more careful look at this later, right now I'm mostly concerned about making sure we can add a tolerance without breaking changes, though hopefully a keyword argument will suffice.

One quick comment I wanted to make is that we can't have updateone! return this pair, it would be breaking (also I don't love the idea of it returning a message, just because it seems like a bit of a weird interface). I realize there is now an issue in that there is a log inside updateone! that we don't want to evaluate twice, so it's not clear what to do. I think if we are going to have this function for parsing message stuff, it's probably fine if we move the info logs out of updateone! so that that function on its own is silent. Technically not breaking, but also not ideal :shrug: As you can tell I don't know exactly how this should be done either, but we should try to do this without breaking.

ExpandingMan commented 9 months ago

Ok, comments in line. I think the gist is that, in order not to return a message string from updateone! (if anything I think it should return the actual metric, not a string, but this would still be technically breaking), logging will have to be moved into update!. Doesn't seem ideal, but I don't see a lot of options so I think I'm ok with that.

wilan-wong-1 commented 9 months ago

@ExpandingMan I've updated the early stopping rounds branch to incorporate some fixes. Predominantly, as the watchlist was previously defaulting to a Dict, this is an unordered data structure which meant relying on the ordering of elements in this data structure could cause unintentional behaviour. Modifications have been made to account for this to ensure order is preserved based on user input (if an OrderedDict is provided).

To ensure backwards compatibility, I've maintained the same xgboost interface but placed a warning if a user inputs a Dict watchlist with more than 1 element and has decided to use the early stopping rounds functionality.

To support early stopping rounds, I've also extended the Booster class to have additional attributes which capture the best round (iteration) and best score as per other XGBoost language implementations. This can then be directly used with the already defined ntree_limit parameter in the predict() method.

I've made additional test cases to also give test coverage to these enhancements.

ExpandingMan commented 8 months ago

Thanks, especially for all the testing. The way I see it the major issue here is the return of the message string, as commented above. Even if it were not breaking it's a bizarre interface. There are lots of other ways of handling this. (See inline comments above, they are still there.)

ExpandingMan commented 8 months ago

This looks good to me, thanks! Let me give it a more thorough look when I get a chance, and hopefully I can merge.

ExpandingMan commented 8 months ago

I haven't forgotten this but I haven't gone through it yet... feel free to ping me if I don't get back to it this week.

ExpandingMan commented 8 months ago

Looks good to me, thanks for this.

dmlc / XGBoost.jl

Add functionality for early stopping rounds. #193