get_input_dim is unable to calculate a model's input dimension if the the model config's data_input.predictor_columns is empty

NREL / Wattile

Deep Learning-based Forecasting of Building Energy Consumption

BSD 3-Clause "New" or "Revised" License

9 stars 0 forks source link

get_input_dim is unable to calculate a model's input dimension if the the model config's data_input.predictor_columns is empty #316

Open smithcommajoseph opened 8 months ago

smithcommajoseph commented 8 months ago

The re-emergence of this bug has kicked off a new round of work, and here is the current state of things after a productive debugging session with @haneslinger. I think this ticket is more of a conversation starter and we'll implement whatever we agree upon

Background get_input_dim uses the given config's data_input.predictor_columns as a starting point for its calculations. Therefore, loading a pre-trained model whose config contains empty predictor_columns (or, more specifically, an empty array) leads to mismatched shapes from what the underlying torch model expects and what it receives.

The recommended short-term fix We should ensure that the configurations of pre-trained models contain predictor columns.

The long-term fix The team should discuss, and we'll settle on an approach that works for everyone.

stephen-frank commented 8 months ago

Ok, that makes sense. A couple of questions...

In Wattile 0.2.0 (i.e. for a model that is trained today), does configs["data_input"]["predictor_columns"] get automatically populated/updated when the model is first trained?
Wattile exports predictors_target_config.json here, after training. We discussed having a trained model that has an existing predictors_target_config.json lock the predictor columns for all future predictions to the set used for the initial training. Would that require updating configs["data_input"]["predictor_columns"]? If so, when (and where) should that happen?

In the meantime, it sounds like the fix would be to go through and copy predictor column names from predictors_target_config.json into configs["data_input"]["predictor_columns"], either manually or by having SkySpark do that automatically. Do I have that right?

stephen-frank commented 8 months ago

So I added the following code block to the SkySpark function that handles prediction and it works!

  // Update configs with stored predictor columns
  session
    .pyExec("configs['data_input']['predictor_columns'] = list(column_specification.values())")

I guess that answers the short-term fix. I'll get this fix into the nrelWattileExt codebase tomorrow sometime. (Right now it is running in a local version of the function on the SkySpark server. By design, the local copy overrides what is included in nrelWattileExt.)

I would still like to discuss the long-term fix of how to "lock in" the predictor columns after initial training. We can do it via this thread or at our next meeting.

smithcommajoseph commented 8 months ago

Ok, that makes sense. A couple of questions...

@haneslinger, we discussed this and my impression was that configs['data_input']['predictor_columns'] were populated during training. I can find where we write the config to .json, but I can't locate where we add the predictor_columns to that config we write. I'm probably being dense here, but I don't have a clear answer on this question yet.
Looking at it, yeah, it seems that we could rework some areas and add the column data to configs['data_input']['predictor_columns']. I wonder, would it be worth just using the predictors_target_config.json as a source of truth for the predictor cols rather than duplicating theses data in another file?

stephen-frank commented 8 months ago

To frame this differently, I think the in longer term we talked about...

configs.json is only used for initial training, after which all the configuration parameters of the model are stored embedded within the model as metadata.
For all activities other than initial training (i.e. re-training with adding more data, validation, prediction), the embedded configuration data should be used. This would include the predictor columns.

How exactly this happens I don't have strong feelings except to say it should not require the end user to have to modify the configuration settings themselves to make everything line up.