Closed Llannelongue closed 4 years ago
Hi! thanks for your contribution!, great first issue!
Sounds really useful. My random thoughts:
I needed something similar recently, should say it was pretty easy to implement for my purposes but due to issue #1243 I had to hack on_test_step_end
method in my model. The way I wanted it to implement is to create a new Trainer
with a Callback
that gathers prediction results in it's state. Then one needs to just call test
method and get the results.
My $0.02 on
in certain contexts, "predict" may not be a good name
In my limited point of view, I can use the word predict
for all my use-cases. Despite the fact, that it may be sometimes slightly inaccurate. Use-cases:
These are all the use-cases I can think of when having a single input and wanting corresponding output from the model.
"estimate" vs predict:
I consider to name the function "estimate" instead of "predict". It make sense to write m.estimate(x) instead of predict(x) e.g. for reinforcement value function to estimate the random parameter of the RL model.
However, I concluded I can always say I predict random variable distribution Y if I ignore how it is further use in more complicated model M. If I talk with respect to model M - I will say I estimated its parameter Y That's what I understood from https://stats.stackexchange.com/a/17789/79340
"infer" vs predict: I just feel that infer is much more vague word than predict. I do not like it.
Another benefit of using "predict" would be to be consistent with machine learning frameworks like sklearn.
One small concern with "estimate" (for stats users): it could suggest that we are estimating the parameters, ie training the model again.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Just checking in here, what is the status/best practices for using the trainer for prediction? It would be nice not to have to write code outside of the trainer loop to handle multi-gp, since all that functionality is there. What is the normal workflow? Use the trainer.test hook?
Just checking in here, what is the status/best practices for using the trainer for prediction?
PL 1.2.0 came with the method predict that will handle multi-gpu:
To perform inference at scale, it is possible to use trainer.predict with LightningModule predict function By default, LightningModule predict calls forward, but it can be overriden to add any processing logic.
class LitMNISTDreamer(LightningModule):
def forward(self, z):
imgs = self.decoder(z)
return imgs
def predict(self, batch, batch_idx: int , dataloader_idx: int = None):
return self(batch)
model = LitMNISTDreamer()
trainer.predict(model, datamodule)
@adriantre How can I gather all predictions when using with BasePredictionWriter
in a multiple GPU setting? Thanks!
What is best practice for running predict
on a larger dataset that doesn't fit i to memory? Is it possible to get the outputs as an iterator instead of a list?
🚀 Feature
A method to make predictions on new data.
Motivation
In a machine learning project, once the model has been trained and evaluated (using the
validation_step
and thetest_step
), it would be useful to have a method to make predictions on new, unlabelled, data.Making predictions for just 1 observation is straight-forward by calling
model(new_data)
. However, to predict on a large dataset, we need to create a dataloader and loop through it while concatenating the outputs. It would be great to integrate that to PyTorch Lightning to take advantage of the ease of implementation, especially regarding multi-gpus.Pitch
Load a pre-trained model and use it for prediction by calling something like:
Alternatives
The standard PyTorch way to do it, with the usual issues with managing devices and parallel processing:
Additional context