deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.17k stars 660 forks source link

MXNet NLP GloVe model causes error "Native resource has been released already" #3286

Closed tadayosi closed 4 months ago

tadayosi commented 5 months ago

Description

When I try to use the MXNet zoo model ai.djl.mxnet/glove/0.0.2/glove, it throws the following error and I cannot use it for NLP word embedding:

Exception in thread "main" java.lang.IllegalStateException: Native resource has been released already.
    at ai.djl.util.NativeResource.getHandle(NativeResource.java:64)
    at ai.djl.mxnet.engine.MxNDArray.getShape(MxNDArray.java:148)
    at ai.djl.ndarray.NDList.toString(NDList.java:538)
    at java.base/java.lang.String.valueOf(String.java:4465)
    at java.base/java.io.PrintStream.println(PrintStream.java:1187)
    at word_embedding.main(word_embedding.java:31)

Expected Behavior

It should not throw the error.

Error Message

See the description.

How to Reproduce?

I use this simple code:

public class WordEmbedding {
    public static void main(String... args) throws Exception {
        var criteria = Criteria.builder()
                .optApplication(Application.NLP.WORD_EMBEDDING)
                .setTypes(String.class, NDList.class)
                .optArtifactId("glove")
                .optProgress(new ProgressBar())
                .build();
        var model = criteria.loadModel();

        var input = "test";
        try (var predictor = model.newPredictor()) {
            var ndlist = predictor.predict(input);
            ndlist.detach();
            System.out.println(ndlist);
        }
    }

Steps to reproduce

Run the above sample code.

What have you tried to solve it?

I searched the Issues and found this: https://github.com/deepjavalibrary/djl/issues/1064#issuecomment-871054042.

So according to @frankfliu, a Translator should not return NDList in general and otherwise the error is expected. However, it is this ai.djl.mxnet/glove/0.0.2/glove model's translator that returns NDList.

NLP.WORD_EMBEDDING :: ai.djl.mxnet/glove/0.0.2/glove {"dimensions":"50"}
  - Args: {dimensions=50, unknownToken=<unk>, translatorFactory=ai.djl.mxnet.zoo.nlp.embedding.GloveWordEmbeddingTranslatorFactory, blockFactory=ai.djl.mxnet.zoo.nlp.embedding.GloveWordEmbeddingBlockFactory}
  - Factory: ai.djl.mxnet.zoo.nlp.embedding.GloveWordEmbeddingTranslatorFactory
  - In/Out: java.lang.String => ai.djl.ndarray.NDList

That leads me to an impression that it's not something that can be fixed at my end but rather a bug in the model / translator.

Thanks.

tadayosi commented 5 months ago

By the way, Apache MXNet project is already in attic mode. So is this model (and other MXNet models) something that is not recommended to use any more for DJL project as well? It appears that the MXNet engine is a bit outdated already and there's no support for AArch64.

frankfliu commented 5 months ago

@tadayosi

I created a PR to address this issue. However the caller must manually close the returned NDList, otherwise will cause memory leak.

tadayosi commented 5 months ago

@frankfliu Thanks, great!