Potential memory leak in loading ml models and creating prediction engine.

LittleLittleCloud commented 3 years ago

This is an issue raised by model builder customer, who encountered an OOM exception after he creating an image classification model and call Predict method for hundreds of times. The behaviour of Predict method is take an ModelInput, create a new ml context and predict engine, and calculate the ModelOutput. So by calling Predict method a hundred times, that user was creating a hundred ml contexts and predict engines. And finally, he found out the memory on his system ran out.

There are actually two questions here

Is this a suggested way to create a new context whenever you want
Looks like the potential memory leak happens only when loading an image classification model, which uses TF.NET. Maybe we didn't dispose of tf session correctly in that trainer?

Link to original issue https://github.com/dotnet/machinelearning-modelbuilder/issues/1666

michaelgsharp commented 3 years ago

Is that behavior what you are generating? Or part of ML.NET itself? It seems likes its part of what model builder is generating.

LittleLittleCloud commented 3 years ago

It's the ml.net code generated by model builder. So I would say the behaviour is still mlnet's behaviours.

michaelgsharp commented 3 years ago

Sorry @LittleLittleCloud Ive been OOF. Back now. You said the behavior is to create a new context and prediction engine each time, correct? Thats not ML.NET, thats modelbuilder thats generating that code. The question is, is that whats causing the OOM? Or is there something in ML.NET that is doing it. Do you know what trainer he was using? Or what his pipeline looked like?

michaelgsharp commented 3 years ago

Ah, I just saw your second point. Only happens with TF.NET.

If you guys aren't calling dispose on the pipeline, or using a using statement, then the native session won't be cleaned up correctly. How hard would that be to change in your code generation? Lets sync on this sometime. @JakeRadMSFT as well.

And no, creating new contexts/prediction engines is NOT the approach we recommend. Its a huge waste of time/resources. Much better to avoid that if possible.

dotnet / machinelearning

Potential memory leak in loading ml models and creating prediction engine. #5897