Open nv-alaiacano opened 2 years ago
@nv-alaiacano Seems like this would solve a number of issues for us on the systems side. Is this beneficial for session-based or should we postpone until after that work is done? Trying to figure out where to slot this in.
Since this work isn't a customer-facing Merlin-level feature that makes sense for product to prioritize, we'll likely go ahead and do this work whenever it's necessary. I expect that at least the schemas part will be beneficial for sequence models, so we're having a chat about that part this morning.
Problem:
In Systems, we need to know information about the various ops that go into an Ensemble. The primary one is the input/output schemas for the data and models that make up the ensemble.
We are able to infer some of this from eg the Tensorflow model, but are not able to do so from more flexible frameworks like pytorch or xgboost.
Saving an NVTabular workflow produces a small amount of info in
metadata.json
, and I propose we expand that concept to record any expected metadata about models, nvt workflows, and other components that we expect to load in a Systems ensemble.Goal:
Constraints:
Schema
python classStarting Point:
Core repo:
Models repo:
Model.save
method in Models to include a metadata file including the required fields.Systems repo: