JuliaAI / MLJModelInterface.jl

Lightweight package to interface with MLJ
MIT License
37 stars 8 forks source link

invalid model parameters should either error or keep user arguments #162

Open ExpandingMan opened 2 years ago

ExpandingMan commented 2 years ago

I noticed today that if a model constructor fails parameter validation it reverts to the default with a warning.

I find this behavior confusing. Normally invalid constructors lead to an error. Alternatively, I can imagine getting frustrated if an overzealous maintainer improperly restricted a parameter range, in which case the thing to do would be to allow the user-provided argument with an error. Either way, reverting to the default seems like a very strange behavior, thoughts?

ablaom commented 2 years ago

This makes sense. As far as changing the behaviour of the @mlj_model macro, that will have to wait for the next breaking release, which won't be for a while.

ExpandingMan commented 1 year ago

We hit another issue today which is related to this but not the same, see here and here.

I propose that always storing all parameters, while perhaps having noble intention, is likely to eternally be fraught with issues in practice. It just requires too much maintenance. Ideally, every model would ultimately provide access to its parameters, but sadly, this is not reality.

Note that, in addition to whatever issues that might have come up, there is probably a lot of extra effort spent in all the interfaces trying to keep things consistent that hasn't necessarily resulted in any issues being opened. Certainly I felt there were a lot of parameter acrobatics involved in MLJXGBoostInterface.jl as is evidenced in its current master.

It's not clear to me whether this issue has caused frustration for many others, but it's something I've been burned by a number of times as evidenced by this thread. I'm also not sure whether any decision should be made here without serious breakage, but it's probably worth thinking about.

ablaom commented 1 year ago

Thanks @ExpandingMan for these further thoughts. Sorry to hear about the parameter woes. Gradient boosters have a lot of them, that's for sure.

In my experience maintaining the MLJ ecosystem, this maintenance concern is particular to XGBoost, because it has so many parameters and because it wraps an algorithm in another language. In EvoTrees.jl the core algorithm struct, and the MLJ struct are the same, so there's no problem there. Most non booster models rarely have more than half a dozen parameters - maybe 10 or 11 at tops. Generally, when we've had to change the parameters or documentation it's been because of a breaking change to the core package that we had to deal with anyway.

On balance, I think the advantages of storing all the parameters in the stuct outweigh the maintenance concerns, painful as they are in you own case.

ExpandingMan commented 1 year ago

Fair enough. Seems like one of those things for which there is no good option, other than maybe raising awareness in packages that implement models to please be more transparent with model parameters as other interfaces probably want to know about them.