deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.07k stars 648 forks source link

Got some problems when loading a model trained by gluonts #1127

Closed cui-jia-hua closed 3 years ago

cui-jia-hua commented 3 years ago

I want to train a time series forecasting model by gluonts and run inference in djl. But i meet some errors like:

ai.djl.translate.TranslateException: ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Error in operator deeparpredictionnetwork0_lstm0_t0_plus0: [13:51:36] ../src/ndarray/../operator/tensor/../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node deeparpredictionnetwork0_lstm0_t0_plus0 at 1-th input: expected [32,160], got [0,160]

    at ai.djl.inference.Predictor.batchPredict(Predictor.java:170)
    at ai.djl.inference.Predictor.predict(Predictor.java:118)
    at ai.djl.examples.inference.my_test.main(my_test.java:59)
Caused by: ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Error in operator deeparpredictionnetwork0_lstm0_t0_plus0: [13:51:36] ../src/ndarray/../operator/tensor/../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node deeparpredictionnetwork0_lstm0_t0_plus0 at 1-th input: expected [32,160], got [0,160]

    at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1930)
    at ai.djl.mxnet.jna.JnaUtils.cachedOpInvoke(JnaUtils.java:1908)
    at ai.djl.mxnet.engine.CachedOp.forward(CachedOp.java:133)
    at ai.djl.mxnet.engine.MxSymbolBlock.forwardInternal(MxSymbolBlock.java:189)
    at ai.djl.nn.AbstractBlock.forward(AbstractBlock.java:121)
    at ai.djl.nn.Block.forward(Block.java:122)
    at ai.djl.inference.Predictor.predict(Predictor.java:123)
    at ai.djl.inference.Predictor.batchPredict(Predictor.java:150)
    ... 2 more

Here is my code, model and data: https://github.com/cui-jia-hua/djlTest This model(DeepAR) is trained by python with gluonts. It seems like some shape is incomatible, but i try to load this model and run inference in gluon and it works without error message.

import mxnet.gluon as gluon
import mxnet.ndarray as nd

name_list = ['data0','data1','data2','data3','data4','data5']
data = nd.load('data/testdata')
net = gluon.nn.SymbolBlock.imports('sy_model/prediction_net//prediction_net-symbol.json', name_list, 'sy_model/prediction_net//prediction_net-0000.params')
out = net(*data)
print(out)

By the way, there are six NDArray in testdata file, shape of them are (32,1),(32,1),(32,745,5),(32,745),(32,745),(32,24,5). I check my mxnet version and it is 1.7.0, so i choose 1.7.0-b version of my mxnet-native-auto.

Is there anything i missed when i load model with djl?

lanking520 commented 3 years ago

Corresponding Java code:

image

frankfliu commented 3 years ago

@cui-jia-hua Your model was saved with NumpyMode.OFF, this was the MXNet 1.5.0 default behavior, we expect new models are all in numpymode. MXNet suppose handle the conversion on model loading, It seems there is bug on MXNet side.

DJL by default set NumpyModel.GLOBAL (this should work for all the model saved with MXNet 1.7.0+).

However, I am able verify this by adding the following in the begining of the code:

        Engine engine = Engine.getInstance(); // Make sure engine is loaded first, otherwise the flag will be lost
        JnaUtils.setNumpyMode(JnaUtils.NumpyMode.OFF);
cui-jia-hua commented 3 years ago

Thank you! I added these codes and got the correct results. This really helps me a lot.