SivilTaram / Persona-Dialogue-Generation

The code of ACL 2020 paper "You Impress Me: Dialogue Generation via Mutual Persona Perception"
MIT License
309 stars 46 forks source link

ValueError: too many values to unpack (expected 2) #4

Closed mattc95 closed 4 years ago

mattc95 commented 4 years ago

[ training... ] [ Saving tensorboard logs here: ./tmp/transmitter/tensorboard ] Traceback (most recent call last): File "train_transmitter.py", line 145, in TrainLoop(opt).train() File "/home/cmy/Persona-Dialogue-Generation/parlai/scripts/train_model.py", line 512, in train world.parley() File "/home/cmy/Persona-Dialogue-Generation/parlai/core/worlds.py", line 663, in parley batch_act = self.batch_act(agent_idx, batch_observations[agent_idx]) File "/home/cmy/Persona-Dialogue-Generation/parlai/core/worlds.py", line 636, in batch_act batch_actions = a.batch_act(batch_observation) File "/home/cmy/Persona-Dialogue-Generation/agents/transmitter/transmitter.py", line 875, in batch_act cand_inds, sampling_cands, is_training) File "/home/cmy/Persona-Dialogue-Generation/agents/transmitter/transmitter.py", line 686, in predict valid_cands=valid_cands) File "/home/cmy/Persona-Dialogue-Generation/agents/transmitter/gpt/model.py", line 91, in forward lm_logits, hidden_states = self.transformer_module(input_seq, None, dis_seq) ValueError: too many values to unpack (expected 2)


I have encountered the above error.

self.transformer_module(input_seq, None, dis_seq) returns 10 unknown values.

Configuration: Python 3.7.3, torch 1.5.0, torchvision 0.6.0, pytorch-pretrained-bert v0.6.2

Everything is set as default.

SivilTaram commented 4 years ago

@mattc95 Hi, I guess you use the Huggingface's transformer rather than our forked version. Please use our forked and modified version and install pytorch-pretrained-bert again.

SivilTaram commented 4 years ago

@mattc95 Please follow the README here : https://github.com/SivilTaram/Persona-Dialogue-Generation#install-custom-dependencies

mattc95 commented 4 years ago

@mattc95 Please follow the README here : https://github.com/SivilTaram/Persona-Dialogue-Generation#install-custom-dependencies

Thank you so much for your help.

I followed the instruction and fixed the previous issue.

Unfortunately, there is another error popped up that I could not solve.


[ Saving tensorboard logs here: ./tmp/transmitter/tensorboard ] Traceback (most recent call last): File "train_transmitter.py", line 145, in TrainLoop(opt).train() File "/home/cmy/code/Persona-Dialogue-Generation/parlai/scripts/train_model.py", line 512, in train world.parley() File "/home/cmy/code/Persona-Dialogue-Generation/parlai/core/worlds.py", line 671, in parley obs = self.batch_observe(other_index, batch_act, agent_idx) File "/home/cmy/code/Persona-Dialogue-Generation/parlai/core/worlds.py", line 623, in batch_observe observation = agents[index].observe(observation) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/transmitter.py", line 654, in observe dict=self.dict) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/utils.py", line 346, in maintain_dialog_history parse_vec = parse(persona_text, split_sentence) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/utils.py", line 315, in parse vec = _dict.txt2vec(txt) File "/home/cmy/code/Persona-Dialogue-Generation/agents/common/gpt_dictionary.py", line 86, in txt2vec tokens = self.tokenizer.tokenize(text) File "/home/cmy/anaconda3/envs/persona/lib/python3.7/site-packages/pytorch_pretrained_bert-0.6.2-py3.7.egg/pytorch_pretrained_bert/tokenization_openai.py", line 235, in tokenize File "/home/cmy/anaconda3/envs/persona/lib/python3.7/site-packages/pytorch_pretrained_bert-0.6.2-py3.7.egg/pytorch_pretrained_bert/tokenization_openai.py", line 177, in bpe TypeError: 'spacy.tokens.token.Token' object is not subscriptable


There is a "not iterable" error while processing the byte-pair encoding.

SivilTaram commented 4 years ago

@mattc95 I have not encountered such an error. It seems that the spacy is not compatiable with the pytorch_pretrained_bert. My spacy version is 2.1.0, you could check it :)

SivilTaram commented 4 years ago

@mattc95 Hi bro, have you solved the problem?

mattc95 commented 4 years ago

@mattc95 Hi bro, have you solved the problem?

` _ pytorch_pretrained_bert/tokenization_openai.py:235: in tokenize split_tokens.extend([t for t in self.bpe(token).split(' ')])


self = <pytorch_pretrained_bert.tokenization_openai.OpenAIGPTTokenizer object at 0x7fb599843b00>, token = lower

def bpe(self, token):
  word = tuple(token[:-1]) + (token[-1] + '</w>',)

E TypeError: 'spacy.tokens.token.Token' object is not subscriptable

pytorch_pretrained_bert/tokenization_openai.py:177: TypeError `

I have tried to install "https://github.com/SivilTaram/transformers" on different servers and virtual environments, but they all failed two tests, one is test_default, and the other is test_full_tokenizer.

The procedures I made:

  1. git clone https://github.com/SivilTaram/transformers
  2. cd transformer
  3. python setup.py install
  4. pip install spacy ftfy==4.4.3
  5. python -m spacy download en
  6. pip install pytest
  7. python -m pytest -sv tests/

Also, I have tried Python 3.5.6 with torch 1.5 or 1.1 or 0.4. Python 3.7.3 with torch 1.5 or 1.4 or 1.1. Python 3.6.0 with torch 1.5 or 1.1. and with spacy 2.2.4 or 2.1.0

I don't think it's a version or server issue, probably I made a mistake?

I really appreciate your help!!!

SivilTaram commented 4 years ago

@mattc95 I will try to reproduce the issue on a clean server, please stay tuned~

mattc95 commented 4 years ago

@SivilTaram Cool, looking forward to it.

chchenhui commented 4 years ago

@mattc95 Please follow the README here : https://github.com/SivilTaram/Persona-Dialogue-Generation#install-custom-dependencies

Thank you so much for your help.

I followed the instruction and fixed the previous issue.

Unfortunately, there is another error popped up that I could not solve.

[ Saving tensorboard logs here: ./tmp/transmitter/tensorboard ] Traceback (most recent call last): File "train_transmitter.py", line 145, in TrainLoop(opt).train() File "/home/cmy/code/Persona-Dialogue-Generation/parlai/scripts/train_model.py", line 512, in train world.parley() File "/home/cmy/code/Persona-Dialogue-Generation/parlai/core/worlds.py", line 671, in parley obs = self.batch_observe(other_index, batch_act, agent_idx) File "/home/cmy/code/Persona-Dialogue-Generation/parlai/core/worlds.py", line 623, in batch_observe observation = agents[index].observe(observation) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/transmitter.py", line 654, in observe dict=self.dict) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/utils.py", line 346, in maintain_dialog_history parse_vec = parse(persona_text, split_sentence) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/utils.py", line 315, in parse vec = _dict.txt2vec(txt) File "/home/cmy/code/Persona-Dialogue-Generation/agents/common/gpt_dictionary.py", line 86, in txt2vec tokens = self.tokenizer.tokenize(text) File "/home/cmy/anaconda3/envs/persona/lib/python3.7/site-packages/pytorch_pretrained_bert-0.6.2-py3.7.egg/pytorch_pretrained_bert/tokenization_openai.py", line 235, in tokenize File "/home/cmy/anaconda3/envs/persona/lib/python3.7/site-packages/pytorch_pretrained_bert-0.6.2-py3.7.egg/pytorch_pretrained_bert/tokenization_openai.py", line 177, in bpe TypeError: 'spacy.tokens.token.Token' object is not subscriptable

There is a "not iterable" error while processing the byte-pair encoding.

Hi, I also encountered this problem when training the transmitter. After I uninstalled spacy, the error disappeared.

SivilTaram commented 4 years ago

@chchenhui Many thanks for your solution. @mattc95 I found the issue, and it relates to this line in pytorch-pretrained-torch. The original code of the lib will also do BPE on special tokens (e.g. <p1>), so I fix it in tokenization_openai.py to do special tokens tokenization better. However, I have tested the modified logic only on the second way in tokenization (if no ftfy or spacy), which will fail in the first way. I will fix my forked version to make the tokenization way determinsitc. Thanks again!

Update

Now @mattc95 you could reinstall my forked transformer and rerun the code. Enjoy the bot ☕

mattc95 commented 4 years ago

@chchenhui Thanks for your alternative solution.

@SivilTaram Yeahhh! It's working smoothly now. Thanks for your time and patient!