!!!Do not merge until xnnpack backend llama is runable!!!
How to use xnnpack backend in mllm
The xnnpack backend in MLLM offers a convenient wrapper function designed to convert a standard CPU-based MLLM module into one that utilizes the xnnpack backend. This function, wrap2xnn, accepts parameters such as inputs_nums, outputs_nums, and any other arguments required for the construction of a LinearModule. For a clearer understanding, please refer to the example provided below:
E.g.:
class LinearModule : public Module {
Layer linear;
public:
LinearModule() {
linear = Linear(1024, 2048, true, "linear");
}
vector<Tensor> Forward(vector<Tensor> inputs, vector<std::any> args) override {
auto x = inputs[0];
auto out = linear(x);
return {out};
}
};
TEST(XpLinearTest, LinearModule) {
mllm::xnnpack::Log::log_level = mllm::xnnpack::Log::ERROR;
auto model = ::mllm::xnnpack::wrap2xnn<LinearModule>(1, 1);
model.setNoLoadWeightsDtype(DataType::MLLM_TYPE_F32);
EXPECT_EQ(Backend::global_backends[MLLM_XNNPACK] != nullptr, true);
Tensor x(1, 1, 256, 1024, Backend::global_backends[MLLM_XNNPACK], true);
x.setTtype(TensorType::INPUT_TENSOR);
for (int i = 0; i < 256 * 1024; ++i) {
*(x.hostPtr<float>() + i) = 1024.f;
}
auto out = model({x})[0];
for (int i = 0; i < 256 * 2048; ++i) {
EXPECT_EQ(*(out.hostPtr<float>() + i) < 1e-18, true);
}
out.printShape();
}
Unlike the dynamic graph mode in MLLM, xnnpack operates on a static graph. This necessitates a mechanism to convert from a dynamic graph to a static graph. The xnnpack backend wrapper in MLLM will add several layers on top of the LinearModule to register input external and output external Tensors. The final wrapped module, as shown in the following pseudocode:
How are the operators in MLLM's xnnpack backend implemented?
Take XpAdd operation as an example:
The XpAdd‘s reshape function is identical to that of CPUAdd. The main differences lie in the setUp and execute functions.
Upon calling execute, XpAdd will integrate a static graph node into the xnnpack subgraph. However, XpAdd performs no actions during the setUp phase. This is because, during the setUp stage, we need to allow the XpDirect Op to determine whether the Tensor is an external input, external output, or a regular tensor.
!!!Do not merge until xnnpack backend llama is runable!!!
How to use xnnpack backend in mllm
The
xnnpack
backend in MLLM offers a convenient wrapper function designed to convert a standard CPU-based MLLM module into one that utilizes thexnnpack
backend. This function,wrap2xnn
, accepts parameters such asinputs_nums
,outputs_nums
, and any other arguments required for the construction of aLinearModule
. For a clearer understanding, please refer to the example provided below:E.g.:
Unlike the dynamic graph mode in MLLM, xnnpack operates on a static graph. This necessitates a mechanism to convert from a dynamic graph to a static graph. The
xnnpack
backend wrapper in MLLM will add several layers on top of theLinearModule
to register input external and output external Tensors. The final wrapped module, as shown in the following pseudocode:You can find more use cases in https://github.com/chenghuaWang/mllm/blob/main/test/xnnpack/
How are the operators in MLLM's xnnpack backend implemented?
Take
XpAdd
operation as an example:The
XpAdd
‘s reshape function is identical to that ofCPUAdd
. The main differences lie in thesetUp
andexecute
functions.Upon calling
execute
,XpAdd
will integrate a static graph node into the xnnpack subgraph. However,XpAdd
performs no actions during thesetUp
phase. This is because, during thesetUp
stage, we need to allow theXpDirect
Op to determine whether the Tensor is an external input, external output, or a regular tensor.