dialog_babi task's performance on MemNNs is bad.

facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

MIT License

10.49k stars 2.1k forks source link

I am trying to reproduce the following paper's result on ParlAI. But the performance looks bad compared to original results. The original result is here. Learning End-to-End Goal-Oriented Dialog, https://arxiv.org/abs/1605.07683 screen shot 2018-01-15 at 2 22 55 pm

And I tested the following command. So task 1's accuracy should be almost 100%. But it looks quite bad.

python examples/train_model.py -t dialog_babi:task:1 -m memnn
...
[ time:262s parleys:26042 ] {'total': 194, 'accuracy': 0.02577, 'f1': 0.08801, 'hits@k': {1: 0.0258, 5: 0.253, 10: 0.459, 100: 1.0}}

I am afraid current MemNNs implementation only supposes babi-20 tasks. For example, the number of dialog_babi candidates is 4212. But I found a such a code. Even I commented out this part, the performance was not improved. https://github.com/facebookresearch/ParlAI/blob/e0f16e9168839be12f72d3431b9819cf3d51fe10/parlai/agents/memnn/memnn.py#L148

Environment

revision: 55ca9c
Results

python examples/train_model.py -t dialog_babi:task:1 -m memnn --dict-file /tmp/db_task1k.dict -vtim 180 ... [ time:2s parleys:97 ] {'total': 97, 'accuracy': 0.7526, 'f1': 0.8021, 'hits@k': {1: 0.753, 5: 0.907, 10: 0.938, 100: 1.0}} ... valid:{'total': 6015, 'accuracy': 0.2991, 'f1': 0.3913, 'hits@k': {1: 0.299, 5: 0.669, 10: 0.672, 100: 0.722}} [ new best accuracy: 0.2991 ] ...

facebookresearch / ParlAI

dialog_babi task's performance on MemNNs is bad. #499

Environment