HybridBeamSearchSampler fails where regular BeamSearchSampler works fine

zeeshansayyed commented 5 years ago

Description

(A clear and concise description of what the bug is.) I modified an existing BeamSearchTranslator class in this example to use HybridBeamSearchSampler instead of regular BeamSearchSampler. The code for the modification is as follows:

class HybridBeamSearchTranslator:
    def __init__(self, model, batch_size=16, beam_size=5, max_length=35):
        self._model = model
        self._scorer = BeamSearchScorer()
        self._scorer.hybridize()
        self._sampler = HybridBeamSearchSampler(
            batch_size=batch_size,
            beam_size=beam_size,
            decoder=self._decode_logprob,
            eos_id=self._model.tgt_vocab.token_to_idx[self._model.tgt_vocab.sep_token],
            scorer=self._scorer,
            max_length=35,
            vocab_size=len(self._model.tgt_vocab)
        )
        self._sampler.hybridize()

    def _decode_logprob(self, step_input, states):
        out, states, _ = self._model.decode_step(step_input, states)
        return mx.nd.log_softmax(out), states

    def translate(self, src_seq, src_valid_length):
        batch_size = src_seq.shape[0]
        encoder_outputs, _ = self._model.encode(src_seq, valid_length=src_valid_length)
        decoder_states = self._model.decoder.init_state_from_encoder(encoder_outputs,
                                                                     src_valid_length)
        inputs = mx.nd.full(shape=(batch_size,), ctx=src_seq.context, dtype=np.float32,
                            val=self._model.tgt_vocab.token_to_idx[self._model.tgt_vocab.cls_token])
        samples, scores, sample_valid_length = self._sampler(inputs, decoder_states)
        return samples, scores, sample_valid_length

While the original works as expected, the new HybridBeamSearchTranslator gives the following error:

Error Message

Traceback (most recent call last):
  File "translation.py", line 135, in <module>
    samples, scores, sample_valid_length = translator.translate(src_seq, src_valid_length)
  File "translation.py", line 116, in translate
    samples, scores, sample_valid_length = self._sampler(inputs, decoder_states)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 548, in __call__
    out = self.forward(*args)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 915, in forward
    return self._call_cached_op(x, *args)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 805, in _call_cached_op
    self._build_cache(*args)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 757, in _build_cache
    data, out = self._get_graph(*args)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 749, in _get_graph
    out = self.hybrid_forward(symbol, *grouped_inputs, **params)  # pylint: disable=no-value-for-parameter
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/sequence_sampler.py", line 687, in hybrid_forward
    ) + tuple(states)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/symbol/contrib.py", line 574, in while_loop
    _create_subgraph(loop_vars, _func_wrapper, name + "_func")
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/symbol/contrib.py", line 493, in _create_subgraph
    outputs, final_state, out_fmt, var_fmt = graph_func(new_graph_vars)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/symbol/contrib.py", line 470, in _func_wrapper
    step_output, new_loop_vars = func(*loop_vars)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/sequence_sampler.py", line 658, in _loop_func
    step_input, _reconstruct_flattened_structure(state_structure, states))
  File "translation.py", line 106, in _decode_logprob
    out, states, _ = self._model.decode_step(step_input, states)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/translation.py", line 188, in decode_step
    self.decoder(self.tgt_embed(step_input), states)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/transformer.py", line 1080, in __call__
    return super(TransformerDecoder, self).__call__(step_input, states)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/seq2seq_encoder_decoder.py", line 218, in __call__
    return super(Seq2SeqDecoder, self).__call__(step_input, states)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 548, in __call__
    out = self.forward(*args)
  File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/transformer.py", line 1083, in forward
    input_shape = step_input.shape
AttributeError: 'Symbol' object has no attribute 'shape'

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

The way I call the above class is as follows:

    tokenizer, model = load_model('some_model_name', 2)
    # translator = BeamSearchTranslator(model)
    translator = HybridBeamSearchTranslator(model)
    train_loader, dev_loader, test_loader = load_data(tokenizer, tokenizer, 20, 20, dev_batch_size=8)
    ctx = mx.gpu(0)
    for _, (src_seq, src_valid_length, __, tgt_seq, tgt_valid_length, inst_ids) in enumerate(dev_loader):
        src_seq = src_seq.as_in_context(ctx)
        tgt_seq = tgt_seq.as_in_context(ctx)
        src_valid_length = src_valid_length.as_in_context(ctx)
        tgt_valid_length = tgt_valid_length.as_in_context(ctx)
        if _ == 5:
            break

    samples, scores, sample_valid_length = translator.translate(src_seq, src_valid_length)
    print(samples.shape)
    print(scores)
    print(sample_valid_length)

Environment

----------Python Info----------
Version      : 3.7.4
Compiler     : GCC 7.3.0
Build        : ('default', 'Aug 13 2019 20:35:49')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.2.2
Directory    : /home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet
Num GPUs     : 1
Commit Hash   : 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f
----------System Info----------
Platform     : Linux-4.14.146-93.123.amzn1.x86_64-x86_64-with-glibc2.10
system       : Linux
node         : ip-172-31-18-232
release      : 4.14.146-93.123.amzn1.x86_64
version      : #1 SMP Tue Sep 24 00:45:23 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2705.098
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.12
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0021 sec, LOAD: 0.8024 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0005 sec, LOAD: 0.5872 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1223 sec, LOAD: 0.2127 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0146 sec, LOAD: 0.2927 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0184 sec, LOAD: 0.3771 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0817 sec, LOAD: 0.7324 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0146 sec, LOAD: 0.4340 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0143 sec, LOAD: 0.2452 sec.

eric-haibin-lin commented 5 years ago

@junrushao1994

szha commented 5 years ago

File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/transformer.py", line 1080, in call return super(TransformerDecoder, self).call(step_input, states) File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/seq2seq_encoder_decoder.py", line 218, in call return super(Seq2SeqDecoder, self).call(step_input, states) File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 548, in call out = self.forward(*args) File "/home/ec2-user/anaconda3/envs/gluon/lib/python3.7/site-packages/gluonnlp/model/transformer.py", line 1083, in forward input_shape = step_input.shape AttributeError: 'Symbol' object has no attribute 'shape'

This looks like transformer was being used as the decoder. Since transformer is still a fake HybridBlock, the call on array.shape caused the exception.

cc @szhengac @sxjscience. What do you recommend as the best way forward?

sxjscience commented 5 years ago

The best way is to stick to BeamSearchSampler now. Let's revise these APIs together with the integration of DeepNumpy.

szhengac commented 5 years ago

I agree with @sxjscience.

Xingjian Shi notifications@github.com 于2019年10月4日周五上午11:02写道：

The best way is to stick to BeamSearchSampler now. Let's revise these APIs together with the integration of DeepNumpy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/gluon-nlp/issues/961?email_source=notifications&email_token=AA6GZVFILSBA7JIYQC5EZ43QM6AJNA5CNFSM4I5Q7MAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMOFXQ#issuecomment-538501854, or mute the thread https://github.com/notifications/unsubscribe-auth/AA6GZVGVAJJG5TPCBYNIEGDQM6AJNANCNFSM4I5Q7MAA .

zeeshansayyed commented 5 years ago

Thank you. Should I keep the issue open?

sxjscience commented 5 years ago

@zeeshansayyed Yes, let's keep it open. We will later refactor the code of GluonNLP, which may solve the problem.

dmlc / gluon-nlp