Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
System information
Describe the current behavior [1,2]:[n193-019-222:14623] [ 1] /opt/tiger/jdk/jdk1.8/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0xb6)[0x7fb6a01cf826]
[1,2]:[n193-019-222:14623] [ 2] /opt/tiger/jdk/jdk1.8/jre/lib/amd64/server/libjvm.so(+0x921e13)[0x7fb6a01c5e13]
[1,2]:[n193-019-222:14623] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7fb789fe2090]
[1,2]:[n193-019-222:14623] [ 4] /usr/lib/python3.8/site-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libcore.so(_ZNSt8__detail9_Map_baseIN4core6DeviceESt4pairIKS2_St10shared_ptrINS1_12IStorageImplEEESaIS8_ENS_10_Select1stESt8equal_toIS2_ESt4hashIS2_ENS_18_Mod_range_hashingENS_20_Default_ranged_hashENS_20_Prime_rehash_policyENS_17_HashtabletraitsILb1ELb0ELb1EEELb1EEixERS4+0x173)[0x7fb5a5025e43]
[1,2]:[n193-019-222:14623] [ 5] /usr/lib/python3.8/site-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libcore.so(_ZN4core10BufferImpl7reserveERKNS_5ShapeENS_6DeviceENS_8DataTypeEm+0x313)[0x7fb5a5025143]
[1,2]:[n193-019-222:14623] [ 6] /usr/lib/python3.8/site-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libembedding.so(_ZN9embedding33UniformModelParallelEmbeddingMetaC1ESt10shared_ptrIN4core19CoreResourceManagerEERKNS_24EmbeddingCollectionParamEm+0x2559)[0x7fb5a3627879]
[1,2]:[n193-019-222:14623] [ 7] /usr/lib/python3.8/site-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libsok_experiment.so(_ZN10tensorflow23EmbeddingCollectionBaseIxxfE11update_metaESt10shared_ptrIN4core19CoreResourceManagerEEiRSt6vectorIiSaIiEE+0x131)[0x7fb5a30162e1]
[1,2]:[n193-019-222:14623] [ 8] /usr/lib/python3.8/site-packages/merlin_sok-1.1.4-py3.8-linux-x86_64.egg/sparse_operation_kit/lib/libsok_experiment.so(_ZN10tensorflow30LookupForwardEmbeddingVarGPUOpIxxfE7ComputeEPNS_15OpKernelContextE+0x891)[0x7fb5a303d9f1]
[1,2]:[n193-019-222:14623] [ 9] /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0xdc)[0x7fb6a1fa3bbc]
[1,2]:[n193-019-222:14623] [10] [n193-019-222:14623] [ 0] [1,4]:[n193-019-222:14625] Process received signal
Describe the expected behavior
Code to reproduce the issue
modelzoo/deepfm
, with no code modifyProvide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.