DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.59k stars 553 forks source link

-trace_after_instrs unable to skip ~trillion instructions #3743

Open rkgithubs opened 5 years ago

rkgithubs commented 5 years ago

I'm running NCF deep learning workload https://github.com/hexiangnan/neural_collaborative_filtering and trying to collect offline traces for the Neural Matrix Factorization (NeuMF) python application. Workload initially loads data, I'd like to skip this part in tracing so I'm specifying -trace_after_instrs 1000000M (loading the data takes roughly these many instructions according to perf stat). While drrun without cache sim tool and drrun -t drcachesim with -trace_after_instrs 1000M run successfully, former run (-t drcachesim -trace_after_instrs 1000000M) crashes. Decided to not turn on debug as it might skew the instruction count. We want to start tracing the correct region. It seems with DR each iteration is taking 75sec (w/o DR it takes 44 sec)

a) after skipping 1000M instructions

root@r2d2-hal5000:/home/daenerys/workloads/neural_collaborative_filtering# /home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -trace_after_instrs 1000M -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1mMLP[64,32,16,8]_1501652038.h5 WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. Hit delay threshold: enabling tracing. Using Theano backend. NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=20, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1mMLP[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1) Exiting process after ~1048577095 references.

b) skipping 1T instructions Even though error messages are pointing to Theano code; this program has been run several times without DR and with DR (no drachesim; -trace_after_instrs 1000M) and it ran successfully.

/home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -trace_after_instrs 1000000M -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1mMLP[64,32,16,8]_1501652038.h5 WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. Using Theano backend. NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=20, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1mMLP[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1) Load data done [64.3 s]. #user=6040, #item=3706, #train=994169, #test=6040 Load pretrained GMF (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5) and MLP (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1mMLP[64,32,16,8]_1501652038.h5) models done.

You can find the C code in this temporary file: /tmp/theano_compilation_error_QEfVMn ERROR (theano.gof.opt): Optimization failure due to: constant_folding ERROR (theano.gof.opt): node: InplaceDimShuffle{x,x}(TensorConstant{0.5}) ERROR (theano.gof.opt): TRACEBACK: ERROR (theano.gof.opt): Traceback (most recent call last): File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2034, in process_node replacements = lopt.transform(node) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/tensor/opt.py", line 6516, in constant_folding no_recycling=[], impl=impl) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 955, in make_thunk no_recycling) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 858, in make_c_thunk output_storage=node_output_storage) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1217, in make_thunk keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1157, in compile keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1624, in cthunk_factory key=key, lnk=self, keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1189, in module_from_key module = lnk.compile_cmodule(location) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1527, in compile_cmodule preargs=preargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2396, in compile_str (status, compile_stderr.replace('\n', '. '))) Exception: ('Compilation failed (return status=255): Fatal error: failed to write trace. ', '[InplaceDimShuffle{x,x}(TensorConstant{0.5})]')

^CTraceback (most recent call last): File "/home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py", line 202, in (hits, ndcgs) = evaluate_model(model, testRatings, testNegatives, topK, evaluation_threads) File "/home/daenerys/workloads/neural_collaborative_filtering/evaluate.py", line 48, in evaluate_model (hr,ndcg) = eval_one_rating(idx) File "/home/daenerys/workloads/neural_collaborative_filtering/evaluate.py", line 63, in eval_one_rating batch_size=100, verbose=0) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/engine/training.py", line 1177, in predict self._make_predict_function() File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/engine/training.py", line 735, in _make_predict_function kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 667, in function return Function(inputs, outputs, updates=updates, kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 653, in init kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function output_keys=output_keys) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc output_keys=output_keys) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1839, in orig_function name=name) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1519, in init optimizer_profile = optimizer(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 108, in call return self.optimize(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 97, in optimize ret = self.apply(fgraph, *args, *kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 251, in apply sub_prof = optimizer.optimize(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 97, in optimize ret = self.apply(fgraph, args, kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2540, in apply sub_prof = gopt.apply(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2143, in apply nb += self.process_node(fgraph, node) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2034, in process_node replacements = lopt.transform(node) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/tensor/opt.py", line 6516, in constant_folding no_recycling=[], impl=impl) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 955, in make_thunk no_recycling) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 858, in make_c_thunk output_storage=node_output_storage) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1217, in make_thunk keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1157, in compile keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1624, in cthunk_factory key=key, lnk=self, keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1189, in module_from_key module = lnk.compile_cmodule(location) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1527, in compile_cmodule preargs=preargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2351, in compile_str p_out = output_subprocess_Popen(cmd) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/misc/windows.py", line 80, in output_subprocess_Popen out = p.communicate() File "/usr/lib/python2.7/subprocess.py", line 800, in communicate return self._communicate(input) File "/usr/lib/python2.7/subprocess.py", line 1417, in _communicate stdout, stderr = self._communicate_with_poll(input) File "/usr/lib/python2.7/subprocess.py", line 1471, in _communicate_with_poll ready = poller.poll() KeyboardInterrupt

derekbruening commented 5 years ago

FTR skipping a trillion instructions works fine on SPECCPU2006, so there does not seem to be any count threshold problem:

$ DynamoRIO-Linux-7.1.0-1/bin64/drrun -t drcachesim -offline -trace_after_instrs 1000G -exit_after_tracing 1000M -- ./bzip2_base.gcc-64bit dryer.jpg 200
spec_init
Loading Input Data
Duplicating 278475 bytes
Duplicating 556950 bytes
Duplicating 1113900 bytes
Duplicating 2227800 bytes
Duplicating 4455600 bytes
Duplicating 8911200 bytes
Duplicating 17822400 bytes
Duplicating 35644800 bytes
Duplicating 71289600 bytes
Duplicating 67136000 bytes
Input data 209715200 bytes in length
Compressing Input Data, level 5
Compressed data 140421953 bytes in length
Uncompressing Data
Uncompressed data 209715200 bytes in length
Uncompressed data compared correctly
Compressing Input Data, level 7
Hit delay threshold: enabling tracing.
Exiting process after ~1048576702 references.

While drrun without cache sim tool and drrun -t drcachesim with -trace_after_instrs 1000M run successfully

So plain DR works fine all the way to the app exit. I'm guessing the -trace_after_instrs 1000M run had an -exit_after_tracing so we don't know whether it would have traced to the end.

Decided to not turn on debug as it might skew the instruction count.

The instruction count used for -trace_after_instrs? No, debug build of DR will make no difference there: it's counting application instructions, not DR instructions.

It seems with DR each iteration is taking 75sec (w/o DR it takes 44 sec)

I assume you mean that drcachesim counting instructions takes 75sec, and that plain DR is faster.

This is surprising since instruction counting is not very complex. Maybe there is some drreg bug where app state gets messed up in some very particular instruction sequence. I would suggest a binary search to find the precise instruction count where things go wrong, and then examining the app code being executed at that point.

Carrotman42 commented 5 years ago

Just peanut gallery comment: Derek's attempt to skip 1 trillion instructions used "1000G" whereas OP's used "1000000M". I don't know the parsing code so maybe that doesn't make any difference.

rkgithubs commented 5 years ago

It turns out i'm able to skip up to 65G instructions successfully and collect offline traces. DR crashes with following errors upon skipping > 70G instructions. In order to trace a meaningful region of the workload we need to skip ~1000G instructions for NCF. I understand this threshold is different for each app, so skipping 1T instructions on SPEC CPU 2006 is not the same for NCF. Could you please confirm that NCF app runs successfully with skipping 1000G instrs. It is available on here and is fairly straight-fwd to set up. https://github.com/hexiangnan/neural_collaborative_filtering

 /home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -trace_after_instrs 60G -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Using Theano backend.
NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=20, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1)
Hit delay threshold: enabling tracing.
Exiting process after ~1048576264 references.

while trying to skip 75G instruction there is following error. Tool doesn't crash but keeps on repeating following errors. 

 /home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -trace_after_instrs 75G -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Using Theano backend.
NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=20, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1)
Load data done [65.7 s]. #user=6040, #item=3706, #train=994169, #test=6040
Load pretrained GMF (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5) and MLP (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5) models done.

You can find the C code in this temporary file: /tmp/theano_compilation_error_kzl4_j
ERROR (theano.gof.opt): Optimization failure due to: constant_folding
ERROR (theano.gof.opt): node: InplaceDimShuffle{x,x}(TensorConstant{0.5})
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2034, in process_node
    replacements = lopt.transform(node)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/tensor/opt.py", line 6516, in constant_folding
    no_recycling=[], impl=impl)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 955, in make_thunk
    no_recycling)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 858, in make_c_thunk
    output_storage=node_output_storage)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1217, in make_thunk
    keep_lock=keep_lock)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1157, in __compile__
    keep_lock=keep_lock)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1624, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1189, in module_from_key
    module = lnk.compile_cmodule(location)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1527, in compile_cmodule
    preargs=preargs)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2396, in compile_str
    (status, compile_stderr.replace('\n', '. ')))
Exception: ('Compilation failed (return status=255): Fatal error: failed to write trace. ', '[InplaceDimShuffle{x,x}(TensorConstant{0.5})]')

You can find the C code in this temporary file: /tmp/theano_compilation_error_VPJcOX
ERROR (theano.gof.opt): Optimization failure due to: constant_folding
ERROR (theano.gof.opt): node: InplaceDimShuffle{x,x}(TensorConstant{0.5})
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2034, in process_node
    replacements = lopt.transform(node)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/tensor/opt.py", line 6516, in constant_folding
    no_recycling=[], impl=impl)
  File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py"
derekbruening commented 4 years ago

Please use triple backtics to quote verbatim output: it will make it easier to read your posts.

The prior comment still stands:

I would suggest a binary search to find the precise instruction count where things go wrong, and then examining the app code being executed at that point.

rkgithubs commented 4 years ago

-trace_after_instrs first of all doesn't seem precise; for instance 67.995G is treated same as 67G. NCF was failing somewhere between 67G and 68G instruction counts (this is while running 1 core, with 28 cores running instrs count is going to be significantly higher). So, i decided to binary search in [67G,68G] interval. Surprisingly -trace_after_instrs 67.995G is successful and 68G fails.

root@r2d2-hal5000:/home/daenerys/workloads/neural_collaborative_filtering# /home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -cores 1 -trace_after_instrs 67.9999G -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 1 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Using Theano backend.
NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=1, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1)
Hit delay threshold: enabling tracing.
Load data done [70.3 s]. #user=6040, #item=3706, #train=994169, #test=6040
Exiting process after ~1048578069 references.
root@r2d2-hal5000:/home/daenerys/workloads/neural_collaborative_filtering# /home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -cores 1 -trace_after_instrs 68G -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 1 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Using Theano backend.
NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=1, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1)
Hit delay threshold: enabling tracing.
Load data done [66.6 s]. #user=6040, #item=3706, #train=994169, #test=6040
Load pretrained GMF (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5) and MLP (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5) models done.
WARNING (theano.gof.compilelock): Overriding existing lock by dead process '27265' (I am process '28109')

You can find the C code in this temporary file: /tmp/theano_compilation_error_DeGExD
ERROR (theano.gof.opt): Optimization failure due to: constant_folding
ERROR (theano.gof.opt): node: InplaceDimShuffle{x,x}(TensorConstant{0.5})
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):

I looked into this /tmp/theano_compilation_error_DeGExD file. There are quite a few failure messages due to Unexpected error in an Op's C code, No Python exception was set.

__failure = 7;
00587           if (!PyErr_Occurred()) {
00588               PyErr_SetString(PyExc_RuntimeError,
00589                   "Unexpected error in an Op's C code. "
00590                   "No Python exception was set.");
00591           }

As per as the program goes, it load the data and couple of pre-trained .h5 files, further training and evaluation for NCF model. But it fails after loading the h5 files.

root@r2d2-hal5000:/home/daenerys/workloads/neural_collaborative_filtering# /home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -cores 1 -trace_after_instrs 68G -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 1 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Using Theano backend.
NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=1, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1)
Hit delay threshold: enabling tracing.
Load data done [66.6 s]. #user=6040, #item=3706, #train=994169, #test=6040
Load pretrained GMF (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5) and MLP (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]_1501652038.h5) models done.
WARNING (theano.gof.compilelock): Overriding existing lock by dead process '27265' (I am process '28109')

In the NeuMF.py failure happens during Init perf tag.

  # Load pretrain model
    if mf_pretrain != '' and mlp_pretrain != '':
        gmf_model = GMF.get_model(num_users,num_items,mf_dim)
        gmf_model.load_weights(mf_pretrain)
        mlp_model = MLP.get_model(num_users,num_items, layers, reg_layers)
        mlp_model.load_weights(mlp_pretrain)
        model = load_pretrain_model(model, gmf_model, mlp_model, len(layers))
        print("Load pretrained GMF (%s) and MLP (%s) models done. " %(mf_pretrain, mlp_pretrain))

    # Init performance
    (hits, ndcgs) = evaluate_model(model, testRatings, testNegatives, topK, evaluation_threads)
    hr, ndcg = np.array(hits).mean(), np.array(ndcgs).mean()
    print('Init: HR = %.4f, NDCG = %.4f' % (hr, ndcg))
    best_hr, best_ndcg, best_iter = hr, ndcg, -1
    if args.out > 0:
        model.save_weights(model_out_file, overwrite=True) 

    # Training model

attached the /tmp/theano_compilation_error_DeGExD file theano_compilation_error_DeGExD.txt

Also pointing out,

Problem occurred during compilation with the command line below:
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -I/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/c_code -L/usr/lib -fvisibility=hidden -o /root/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmpJGpoGN/367f127a65c0f4aaa34389bc7b82c4fa5c2ecceb949e57e09ec98fcbc9408937.so /root/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmpJGpoGN/mod.cpp -lpython2.7Fatal error: failed to write trace
rkgithubs commented 4 years ago

status: NeedInfo. Please let us know if you need any more information on this bug.

derekbruening commented 4 years ago

-trace_after_instrs first of all doesn't seem precise; for instance 67.995G is treated same as 67G.

The option takes integers only: no floating-point. The option parsing may need tweaking to binary search down to a final number: https://github.com/DynamoRIO/dynamorio/issues/3605

What we want for "the app code being executed at that point" is the basic block that causes the bug. Then a tiny reproducer can be created and the problem more easily understood and fixed.