Open damccorm opened 2 years ago
Should we make returning just the inference the default and handle this with a composable ModelHandler
that produces PredictionResult
instances, similar to KeyedModelHandler
?
Unfortunately that couldn't be done in a backwards compatible way.
CC: @robertwb
Subtask of https://github.com/apache/beam/issues/22117
Should we make returning just the inference the default and handle this with a composable
ModelHandler
that producesPredictionResult
instances, similar toKeyedModelHandler
?Unfortunately that couldn't be done in a backwards compatible way.
My instinct is that the large majority of users are going to want both passed through and we should prioritize that need (and not making a breaking change) over making things a little cleaner for the just inference case.
The other advantage of always returning a PredictionResult is that it gives users more flexibility if, for example, they initially only need the inferences but later want to do something with the input itself. If we just return the inferences then they need to insert some intermediate mapping step to make their existing logic work, if we always return a PredictionResult
then they can flip the flag, grab the initial input wherever they need it, and not touch anything else.
There's also a larger philosophical question of when we should introduce a new model handler vs parameterizing a new behavior option. I'd argue that because in this case omitting the input is essentially a subset of the current output (with no changes to the input) and we're not changing the graph structure we're better off treating this as an option (parameter to the RunInference
call) than introducing a new handler.
What would be the plan of action on this one? By looking at comments, PredictionResult
will remain same. Are we adding an optional flag to exclude the input from the PredictionResult
?
I guess if someone wants to exclude the outputs, they can do it using beam.Map()
or a PostProcessor.
Edit: I guess if someone wants to exclude the inputs, they can do it using beam.Map()
or a PostProcessor.
Are we adding an optional flag to exclude the input from the PredictionResult?
Yeah, I think that's the best option here
I guess if someone wants to exclude the outputs, they can do it using beam.Map() or a PostProcessor.
I don't quite follow this - why would someone want to exclude the outputs? Do you just mean unnesting the outputs from the PredictionResult
?
Are we adding an optional flag to exclude the input from the PredictionResult?
Yeah, I think that's the best option here
I guess if someone wants to exclude the outputs, they can do it using beam.Map() or a PostProcessor.
I don't quite follow this - why would someone want to exclude the outputs? Do you just mean unnesting the outputs from the
PredictionResult
?
I am sorry. It's inputs
not outputs
.take-issue
I ran few experiments on Dataflow Runner using Pytorch Model Handler
I didn't notice a significant change in the resources. @damccorm what do you think?
Fiirst image: Default, second image: PredictionResult with input = None
I think the difference largely will depend on workload. For our examples it is probably not significant because we're pretty much just writing the inference results immediately, but I can imagine if (a) your input size is significantly larger or (b) you're doing more than simple maps post-inference (especially if you're doing something that breaks pipeline fusion and causes work to be distributed to new workers), then the costs would be higher.
I'm generally still in favor of doing this as a small convenience for users since its cheap
drop_example can be passed via ModelHandler or beam.RunInference. As of now, I added it to the RunInference Ptransform since it outputs the PredictionResult........but in the process
method of the transform, it calls self._model_handler.run_inference()
which returns PredictionResult
so it could be passed via ModelHandler as well.
@AnandInguva I'm reopening this one since we're trying https://github.com/apache/beam/pull/23356 instead.
RunInference currently returns both the example/input and prediction.
Users may want the ability to only return the inference to minimize potential memory/serialization issues later in the pipeline. They can do this with a flag, and the return value may look like
Imported from Jira BEAM-14362. Original Jira may contain additional context. Reported by: yeandy. Subtask of issue #21435