执行yolov5模型内存泄漏问题(异步执行或者放在spring环境执行会出现内存无法回收问题，在main线程不会)

90600 commented 10 months ago

Description

本来是把djl推理yolov5模型整合到springboot项目中，压测发现内存无法回收，随着压测的增加内存也增加。注意：和这个issues并不是一个问题https://github.com/deepjavalibrary/djl/issues/2800 ，它这个在main线程中执行确实不会有问题，但是异步执行就会有内存泄漏问题。在spring项目中内存并不会回收项目中没有用到opencv

Expected Behavior

100个线程预测就会出现内存变大并且并不会回收

How to Reproduce?

import ai.djl.Device;
import ai.djl.MalformedModelException;
import ai.djl.ModelException;
import ai.djl.inference.Predictor;
import ai.djl.modality.cv.Image;
import ai.djl.modality.cv.ImageFactory;
import ai.djl.modality.cv.output.DetectedObjects;
import ai.djl.modality.cv.transform.Resize;
import ai.djl.modality.cv.translator.YoloV5Translator;
import ai.djl.repository.zoo.Criteria;
import ai.djl.repository.zoo.ModelNotFoundException;
import ai.djl.repository.zoo.ZooModel;
import ai.djl.translate.TranslateException;
import com.fpi.bmp.algorithm.management.translator.YoloV5RelativeTranslator;
import com.fpi.bmp.algorithm.management.util.ModelUrlUtil;

import java.io.IOException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {
    private Predictor<Image, DetectedObjects> predictor;
    private ZooModel<Image, DetectedObjects> model;
    private Criteria<Image, DetectedObjects> criteria;

    public static void main(String[] args) throws ModelException, IOException, TranslateException {
        Main p = new Main();
        ExecutorService executorService = Executors.newFixedThreadPool(100);
        for (int i = 0; i < 100; i++) {
            executorService.execute(() -> {
                try {
                    p.detect("http://hangzhou-test.fpi-inc.site/file-base-server/api/v1/sys/download/10e7d36d8c03499ba904e53df39e1eb0");
                } catch (IOException e) {
                    e.printStackTrace();
                } catch (TranslateException e) {
                    e.printStackTrace();
                }
                System.out.println("iiii===" + Thread.currentThread().getName());
            });
        }
        System.out.println("main=" + Thread.currentThread().getName());
    }

    public Main() throws ModelNotFoundException, MalformedModelException, IOException {
        YoloV5Translator translator = YoloV5Translator.builder()
                .addTransform(new Resize(640, 640))
                .optRescaleSize(640, 640)
                .optApplyRatio(true)
                .optThreshold(0.4f)
//                .optSynset(Arrays.asList("smoke"))
                .build();

        YoloV5RelativeTranslator myTranslator = new YoloV5RelativeTranslator(translator, 640, 640);

        criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optDevice(Device.cpu())
                .optModelUrls(ModelUrlUtil.getRealUrl("/model/smoke/smoke.zip"))
                .optTranslator(myTranslator)
                .optEngine("OnnxRuntime")
                .build();
        model = criteria.loadModel();
        predictor = model.newPredictor();
    }

    public void detect(String imgPath) throws IOException, TranslateException {
        Image img = ImageFactory.getInstance().fromUrl(imgPath);
        long starTime = System.currentTimeMillis();
        DetectedObjects predict = predictor.predict(img);
        long endTime = System.currentTimeMillis();
        System.out.println(Thread.currentThread().getName() + " 模型图片推理时间=" + (endTime - starTime));
        System.out.println(predict);

    }
}

Environment Info

<dependency>
            <groupId>ai.djl.mxnet</groupId>
            <artifactId>mxnet-model-zoo</artifactId>
        </dependency>
        <dependency>
            <groupId>ai.djl.mxnet</groupId>
            <artifactId>mxnet-engine</artifactId>
            <scope>runtime</scope>
        </dependency>

        <dependency>
            <groupId>ai.djl.onnxruntime</groupId>
            <artifactId>onnxruntime-engine</artifactId>
            <scope>runtime</scope>
        </dependency>

希望您们抽出宝贵的时间帮忙看下，因为这个对我很重要

今天我在换成PyTorch引擎加载后，内存会回收，但是压测完后还是会有3G内存在，这让我怀疑是不是djl的问题。

frankfliu commented 10 months ago

We have many customers use PyTorch engine, and we didn't observe memory leak issue. PyTorch use memory pool to manage native memory. It won't release back to system. As long as it stable at the peak, it should be fine.

90600 commented 10 months ago

We have many customers use PyTorch engine, and we didn't observe memory leak issue. PyTorch use memory pool to manage native memory. It won't release back to system. As long as it stable at the peak, it should be fine.

那你用过OnnxRuntime引擎去跑么，它是不会回收的

90600 commented 10 months ago

We have many customers use PyTorch engine, and we didn't observe memory leak issue. PyTorch use memory pool to manage native memory. It won't release back to system. As long as it stable at the peak, it should be fine.

你可以执行一下我上面的代码，你就会发现问题，用OnnxRuntime引擎

90600 commented 10 months ago

We have many customers use PyTorch engine, and we didn't observe memory leak issue. PyTorch use memory pool to manage native memory. It won't release back to system. As long as it stable at the peak, it should be fine.

PyTorch引擎确实会把内存稳定在一个区间里，但是OnnxRuntime引擎会一直递增，同一份代码

kkangert commented 5 months ago

我也存在这个问题。

frankfliu commented 5 months ago

@kkangert

We are running onnxruntime benchmark nightly, and we never obverse memory leak. I created a project similar to above code, and it dose not have memory leak: https://github.com/frankfliu/djl/tree/yolo5/yolo

Here is a few common mistakes that causing memory issues:

Use too many threads, the engine takes large memory and cause OOM (this is not memory leak). In my example, it uses 100 threads, and on mac it uses around 6G of RAM, and will stable at the peak.
You are creating more than one Model or Predictor object, and you didn't close them, this will cause native memory leak and eventually will cause OOM. Make sure you load model only once, and all the threads use the same Predictor
You use ThreadLocal for each Predictor, but Predictor is not closed when thread terminated. This will cause memory leak.
You use OpenCV to read image, there is known memory leak issue if you use your own version of OpenCV (DJL's OpenCV extension works fine)
Your Translator implementation has memory leak. In my example, I'm using our built-in Translator

deepjavalibrary / djl