Open cimentadaj opened 4 years ago
Thinking about it even more, the current implementation is error prone. For example, mtry
is estimated from the columns in the data. If the user specifies tune
in the recipe but at the same time removes some columns in the recipe, dials::finalize
will be applied on the data without the recipe. This means that mtry
will be updated using the old data rather than the prepped data.
Note that dials::finalize
should work fine as long as there are no tune
values in the recipe because then the mold will have the prepped data. The only problem arises when the recipe cannot be prepped due to having a tune
specification.
There is a small error coming in from
dials::finalize
when attempting to finalizerbf_sigma
with the code below:I've identified the problem but can't come with an elegant solution right now. The problem is this:
fit.action_grid
extracts the parameters from the recipe and model as aparameters
object.parameters
object todials::finalize
together with the data from the mold.dials::finalize
requires the data to be entirely numeric to estimate values ofrbg_sigma
. The problem is that the columnV1
is a factor and raises the error. This won't be fixed if the user convertsV1
to numeric in the recipe because the recipe cannot be preped/juiced intidyflow
given that it has atune
placeholder. On the other hand, if the user convertsV1
to numeric outside the recipe, it raises another error in recipe becausestep_dummy
requiresV1
to be a factor.After understanding this problem, there is a fix but I don't like it because it's not intuitive at all. The solution is to convert
V1
to numeric outside the recipe (this data is that one that will be passed todials::finalize
) and then convertV1
to factor withstep_mutate
before passing it tostep_dummy
.Possible solution that I've thought about but that I've discarded:
Inside
fit.action_grid
, convert all columns to numeric automatically (if there are characters, convert to factors and then to numeric). This is the simplest solution but I don't like this because it alters the users data to calculate a value without the user knowing.Specifying the range of
rbg_sigma()
in the parameters ofplug_grid
should override anydials::finalize
. This also is an elegant solution and it works with all tuning parameters exceptrbg_sigma
. This is because, for example,mtry()
is really empty and needs finalization.rbf_sigma()
is not empty (it can work without finalization) but if run ondials::finalize
it estimates the range from the data again.Catch the specific error and raise a more informative error such as:
All the predictors need to be numeric to finalize the tuning grid. Make sure you convert all predictors to numeric in the raw data
. Again, this is not elegant and doesn't really fix the problem.At this point, I'm putting it here so I can organize my ideas but I don't know how to fix this elegantly without adding an exception for finalize