Add Useful API for training cycle so that user doesn't need to code training loop from scratch each time.
Add new methods in Job:
Job.request()
Same as we currently do inside the .start() method (auth, download of model and plan).
trainingProcess = Job.train(trainingPlan, parameters)
Helper for training loop
trainingProcess - object would contain current epoch, batch, modelParameters
trainingPlan - string
parameters - dict of values:
planInputs: list of PlanInputSpecplanOutputs: list of PlanOutputSpecdata: tensor
target: (optional) tensor
epochs: number - how many epoch to train
batchSize: number
stepsPerEpoch: (optional) number - max number of steps per epoch
events: list of handlers: 'start', 'end', 'epochStart', 'epochEnd', 'batchStart', 'batchEnd', 'error'
PlanInputSpec: object that describes plan input argument
Feature Description
Add Useful API for training cycle so that user doesn't need to code training loop from scratch each time.
Add new methods in Job:
Job.request()
Same as we currently do inside the.start()
method (auth, download of model and plan).trainingProcess = Job.train(trainingPlan, parameters)
Helper for training looptrainingProcess
- object would contain current epoch, batch, modelParameterstrainingPlan
- stringparameters
- dict of values:PlanInputSpec
: object that describes plan input argumentPlanOutputSpec: object that describes plan output
Pseudo code: Training loop:
Resolving plan inputs/outputs from specs:
Example for input/output specs for MNIST training plan:
What alternatives have you considered?
API was discussed in FL team.
Additional Context
n/a