Samsung / walrus

WebAssembly Lightweight RUntime
Apache License 2.0
39 stars 10 forks source link

[JIT] assertion failed in emitFloatCompare #184

Closed clover2123 closed 11 months ago

clover2123 commented 11 months ago

When I run a sample tfjs-model (imagenet) within aarch64 platform, it failed in the following part.

In emitFloatCompare method https://github.com/Samsung/walrus/blob/6f000941bc76de44ccc69a42466652bd3c637473/src/jit/FloatMathInl.h#L441-L444

I found that for a certain condition, instr->next() is CodeLabel which leads to assertion fail at instr->next()->asInstruction() because it seems that CodeLabel is not a instruction group.

I briefly share the issue here first.

zherczeg commented 11 months ago

It looks like I forgot this case. A compare cannot be the last instruction of a code block, because it always ends with a return. I silently included labels into this rule, which is not considered as an instruction by the compiler. We just need to put the code block into an isInstruction() check.

zherczeg commented 11 months ago

https://github.com/Samsung/walrus/blob/6f000941bc76de44ccc69a42466652bd3c637473/src/jit/IntMath32Inl.h#L1001 We did it correctly for math

clover2123 commented 11 months ago

@zherczeg Thanks! I fixed it adding the code into a new if (instr->next()->isInstruction()) block, but another problem has occurred.

Now, it fails on the below check code: https://github.com/zherczeg/sljit/blob/44cee7aa5d6732de8f5eba892d52c734f1d0fe2c/sljit_src/sljitLir.c#L2700

Could you guess the reason? Or I'll give you a detailed description to reproduce this error. (I tested on aarch64 environment because current walrus supports SIMD JIT only for arm platforms)

zherczeg commented 11 months ago

These are usually caused by passing wrong parameters to the compiler. I would need a bit of backtrace where it is called (a sljit_emit_simd_lane_mov call somewhere from src/jit) and what was the arguments. I can check a detailed report as well.

clover2123 commented 11 months ago

Following is the whole backtrace when the assertion fails I run a sample model by lwnode-escargot-walrus with your comment updated

#1  0x0000fffff71faaac in __GI_abort () at abort.c:79
#2  0x0000fffff6fe8f64 in check_sljit_emit_simd_lane_mov (compiler=0xaaaaac9b4640, type=541697, freg=1, lane_index=1, srcdst=127, srcdstw=0) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/third_party/walrus/src/jit/../../third_party/sljit/sljit_src/sljitLir.c:2700
#3  0x0000fffff6ff3984 in sljit_emit_simd_lane_mov (compiler=0xaaaaac9b4640, type=541697, freg=1, lane_index=1, srcdst=127, srcdstw=0) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/third_party/walrus/src/jit/../../third_party/sljit/sljit_src/sljitNativeARM_64.c:2818
#4  0x0000fffff7002560 in Walrus::emitExtractLaneSIMD (compiler=0xaaaaac9b4640, instr=0xaaaaac8f6da0) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/third_party/walrus/src/jit/SimdInl.h:91
#5  0x0000fffff7004678 in Walrus::JITCompiler::compile (this=0xffffffffde10) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/third_party/walrus/src/jit/Backend.cpp:812
#6  0x0000fffff7014104 in Walrus::Module::jitCompile (this=0xaaaaabe836f0, functions=0x0, functionsLength=0, verboseLevel=1) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/third_party/walrus/src/jit/ByteCodeParser.cpp:1289
#7  0x0000fffff7037d54 in Walrus::WASMParser::parseBinary (store=0xaaaaabe327b0, filename="", data=0xfffff4394010 "", len=424594) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/third_party/walrus/src/parser/WASMParser.cpp:2510
#8  0x0000fffff707d378 in wasm_module_new (store=0xaaaaabe32810, binary=0xffffffffe2a8) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/third_party/walrus/src/api/wasm.cpp:796
#9  0x0000fffff7bfed84 in Escargot::WASMOperations::compileModule (state=..., thisValue=..., argc=1, argv=0xffffffffe490, newTarget=...) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/wasm/WASMOperations.cpp:237
#10 0x0000fffff7b6132c in Escargot::NativeFunctionObject::processNativeFunctionCall<false, true> (this=0x126e610, state=..., receiverSrc=..., argc=1, argv=0xffffffffe490, newTarget=...) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/runtime/FunctionObjectInlines.h:301
#11 0x0000fffff7b60ec4 in Escargot::NativeFunctionObject::call (this=0x126e610, state=..., thisValue=..., argc=1, argv=0xffffffffe490) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/runtime/NativeFunctionObject.cpp:78
#12 0x0000fffff7b6bc88 in Escargot::Object::call (state=..., callee=..., thisValue=..., argc=1, argv=0xffffffffe490) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/runtime/Object.cpp:1280
#13 0x0000fffff7b57e88 in Escargot::PromiseReactionJob::<lambda()>::operator()(void) const (__closure=0xffffffffe618) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/runtime/Job.cpp:70
#14 0x0000fffff7b597e8 in std::_Function_handler<Escargot::Value(), Escargot::PromiseReactionJob::run()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:285
#15 0x0000fffff7ba61c4 in std::function<Escargot::Value ()>::operator()() const (this=0xffffffffe618) at /usr/include/c++/9/bits/std_function.h:688
#16 0x0000fffff7ba460c in Escargot::SandBox::run(std::function<Escargot::Value ()> const&) (this=0xffffffffe5b0, scriptRunner=...) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/runtime/SandBox.cpp:123
#17 0x0000fffff7b58138 in Escargot::PromiseReactionJob::run (this=0x17bda10) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/runtime/Job.cpp:86
#18 0x0000fffff7be4640 in Escargot::VMInstance::executePendingJob (this=0x57070) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/runtime/VMInstance.cpp:743
#19 0x0000fffff788d0a8 in Escargot::VMInstanceRef::executePendingJob (this=0x57070) at /home/ubuntu/hwpark/lwnode_wasm_jit/deps/escargot/src/api/EscargotPublic.cpp:1194
#20 0x0000aaaaab451528 in v8::MicrotasksScope::PerformCheckpoint (v8_isolate=0x4ed90) at ../../../src/api-environment.cc:2532
#21 0x0000aaaaab768d70 in node::InternalCallbackScope::Close (this=0xffffffffec20) at ../../../deps/node/src/api/callback.cc:114
#22 0x0000aaaaab768b28 in node::InternalCallbackScope::~InternalCallbackScope (this=0xffffffffec20, __in_chrg=<optimized out>) at ../../../deps/node/src/api/callback.cc:77
#23 0x0000aaaaab4d0c84 in node::StartExecution(node::Environment*, std::function<v8::MaybeLocal<v8::Value> (node::StartExecutionCallbackInfo const&)>) (env=0xaaaaabe48e20, cb=...) at ../../../deps/node/src/node.cc:449
#24 0x0000aaaaab76df68 in node::LoadEnvironment(node::Environment*, std::function<v8::MaybeLocal<v8::Value> (node::StartExecutionCallbackInfo const&)>, std::unique_ptr<node::InspectorParentHandle, std::default_delete<node::InspectorParentHandle> >) (env=0xaaaaabe48e20, cb=...,
    removeme=std::unique_ptr<struct node::InspectorParentHandle> = {...}) at ../../../deps/node/src/api/environment.cc:488
#25 0x0000aaaaab76dea4 in node::LoadEnvironment (env=0xaaaaabe48e20) at ../../../deps/node/src/api/environment.cc:478
#26 0x0000aaaaab4cf0c4 in LWNode::LWNodeMainRunner::Run (this=0xffffffffee50, nodeMainInstance=...) at ../../../deps/node/src/node_main_lw_runner-inl.h:127
#27 0x0000aaaaab4d3108 in node::Start (argc=3, argv=0xfffffffff1f8) at ../../../deps/node/src/node.cc:1107
#28 0x0000aaaaab3ae360 in main (argc=3, argv=0xfffffffff1f8) at ../../../deps/node/src/node_main_lw.cc:83
clover2123 commented 11 months ago

I share the procedure to build and reproduce the error below.

async function useWasm() { tf = require('@tensorflow/tfjs'); const { getThreadsCount, setThreadsCount, } = require('@tensorflow/tfjs-backend-wasm'); await tf.setBackend('wasm'); setThreadsCount(2); console.log('step 0 - set backend', tf.getBackend(), getThreadsCount()); }

async function useNode() { tf = require('@tensorflow/tfjs-node'); console.log('step 0 - set backend', 'node adapter'); }

async function run() { try { console.time('tf');

            if (process.argv[2] === 'wasm') {
                    await useWasm();
            } else if(process.argv[2] === 'node') {
                    useNode();
            } else {
                    tf = require('@tensorflow/tfjs');
            }

            const img = tf.ones([1, 224, 224, 3]).toFloat();

            console.log('step 1 - download model');
            const model = await tf.loadGraphModel(
                    'https://tfhub.dev/google/tfjs-model/imagenet/mobilenet_v2_130_224/classification/3/default/1',
                    { fromTFHub: true },
            );

            console.log('step 2 - start prediction');
            const prediction = await model.predict(img);
            console.log(prediction);
            console.timeEnd('tf');
    } catch (error) {
            console.log('error', error);
    }

}

run();

clover2123 commented 11 months ago

Oops, I forgot to add the package.json file below To run the above sample code, you need to install some modules which are defined in package.json file.

zherczeg commented 11 months ago

Thank you for all the info. I was able to set up the project on x86. I hoped the error will be present there because working with ARM-64 hw is a bit difficult. Fortunately it worked (after applying PR 179).

It looks like this bug is a typo: https://github.com/Samsung/walrus/blob/main/src/jit/ByteCodeParser.cpp#L594 Instruction* instr = compiler->append(byteCode, Instruction::ExtractLaneSIMD, opcode, 1, 1);

The extract lane has 1 input and 1 output argument, not two input and zero output.

I got this afterwards:

step 2 - start prediction
Tensor {
  kept: false,
  isDisposedInternal: false,
  shape: [ 1, 1001 ],
  dtype: 'float32',
  size: 1001,
  strides: [ 1001 ],
  dataId: { id: 423 },
  id: 212,
  rankType: '2',
  scopeId: 1
}
tf: 53.845s

I hope this is correct.

zherczeg commented 11 months ago

Two bugs are fixed in #185

clover2123 commented 11 months ago

@zherczeg Thank you for the quick patch!