Open smithcommajoseph opened 8 months ago
Ok, that makes sense. A couple of questions...
configs["data_input"]["predictor_columns"]
get automatically populated/updated when the model is first trained?predictors_target_config.json
here, after training. We discussed having a trained model that has an existing predictors_target_config.json
lock the predictor columns for all future predictions to the set used for the initial training. Would that require updating configs["data_input"]["predictor_columns"]
? If so, when (and where) should that happen?In the meantime, it sounds like the fix would be to go through and copy predictor column names from predictors_target_config.json
into configs["data_input"]["predictor_columns"]
, either manually or by having SkySpark do that automatically. Do I have that right?
So I added the following code block to the SkySpark function that handles prediction and it works!
// Update configs with stored predictor columns
session
.pyExec("configs['data_input']['predictor_columns'] = list(column_specification.values())")
I guess that answers the short-term fix. I'll get this fix into the nrelWattileExt codebase tomorrow sometime. (Right now it is running in a local version of the function on the SkySpark server. By design, the local copy overrides what is included in nrelWattileExt.)
I would still like to discuss the long-term fix of how to "lock in" the predictor columns after initial training. We can do it via this thread or at our next meeting.
Ok, that makes sense. A couple of questions...
To frame this differently, I think the in longer term we talked about...
configs.json
is only used for initial training, after which all the configuration parameters of the model are stored embedded within the model as metadata.How exactly this happens I don't have strong feelings except to say it should not require the end user to have to modify the configuration settings themselves to make everything line up.
The re-emergence of this bug has kicked off a new round of work, and here is the current state of things after a productive debugging session with @haneslinger. I think this ticket is more of a conversation starter and we'll implement whatever we agree upon
Background get_input_dim uses the given config's data_input.predictor_columns as a starting point for its calculations. Therefore, loading a pre-trained model whose config contains empty predictor_columns (or, more specifically, an empty array) leads to mismatched shapes from what the underlying torch model expects and what it receives.
The recommended short-term fix We should ensure that the configurations of pre-trained models contain predictor columns.
The long-term fix The team should discuss, and we'll settle on an approach that works for everyone.