alibaba / x-deeplearning

An industrial deep learning framework for high-dimension sparse data
Apache License 2.0
4.26k stars 1.03k forks source link

TDMServing run failed #255

Open better629 opened 5 years ago

better629 commented 5 years ago

Hi @woso @lovickie , With the userbehavior data, I compiled the tdm and get the converted dense model model_blaze model_blaze_optimized(model.dat). But when I use them in TDMServing tdm_example, there is a error below.

run log:
I0821 09:38:44.779547 18774 blaze_model.cpp:58] [item_tree_model] blaze load sparse model weight succ, file: test_data/model/blaze_model/sparse_qed
I0821 09:38:44.795930 18774 blaze_model.cpp:70] [item_tree_model] blaze load model succ file: test_data/model/blaze_model/model.dat
E0821 09:38:44.804941 18774 blaze_model.cpp:79] [item_tree_model] blaze can not get predictor
E0821 09:38:44.804960 18774 model_unit.cpp:120] [item_tree_model] init model failed

error log:
[ERROR] [2019-08-16 07:16:33] [4586] [/home/work/open-code/x-deeplearning/blaze/blaze/api/cpp_api/predictor_manager_impl.cc:157] Create Model Predictor failed, test_data/model/blaze_model/model.dat msg=[failed at workspace.h:90]. [/home/work/open-code/x-deeplearning/blaze/blaze/graph/workspace.h:90] input_name: %s data_type not definedblockgrad1

It seems that load model.dat failed, but I have found blockgrad1 which exists in model.dat and also exists in graph.txt at

node {
  name: "/MxnetBackendOp"
  op: "MxnetBackendOp"
  ... {\n      \"op\": \"BlockGrad\", \n      \"name\": \"blockgrad1\", \n      \"inputs\": [[179, 0, 0]]\n    } ...
}

and the related op in txt-file model.dat are

op {
  type: "Slice"
  name: "slice_axis10"
  input: "blockgrad1"
  output: "slice_axis10"
  arg {
    name: "axis"
    i: 1
  }
  arg {
    name: "start"
    i: 1
  }
  arg {
    name: "end"
    i: 2
  }
}
op {
  type: "ReduceSum"
  name: "sum0"
  input: "blockgrad1"
  output: "sum0"
  arg {
    name: "axis"
    i: 1
  }
  arg {
    name: "keepdims"
    i: 0
  }
}
op {
  type: "Mul"
  name: "_mul2"
  input: "log0"
  input: "blockgrad1"
  output: "_mul2"
}

external_input {
  name: "blockgrad1"
  dtype: kFloat
}

There is no blockgrad1 name in op.

About these problems, How to fix them.

Thank you!

better629 commented 5 years ago

@woso @lovickie @songyue1104 @yiling-dc Please have a look at above problem.

better629 commented 5 years ago

@woso @lovickie @songyue1104 @yiling-dc Please have a look at above problem.

WellsRevive commented 5 years ago

@better629 Can you provide me with the complete model.dat file?

better629 commented 5 years ago

@better629 Can you provide me with the complete model.dat file?

@WellsRevive can you send a email to my-email in my-profile.