Closed mattc95 closed 4 years ago
@mattc95 Hi, I guess you use the Huggingface's transformer rather than our forked version. Please use our forked and modified version and install pytorch-pretrained-bert
again.
@mattc95 Please follow the README here : https://github.com/SivilTaram/Persona-Dialogue-Generation#install-custom-dependencies
@mattc95 Please follow the README here : https://github.com/SivilTaram/Persona-Dialogue-Generation#install-custom-dependencies
Thank you so much for your help.
I followed the instruction and fixed the previous issue.
Unfortunately, there is another error popped up that I could not solve.
[ Saving tensorboard logs here: ./tmp/transmitter/tensorboard ]
Traceback (most recent call last):
File "train_transmitter.py", line 145, in
There is a "not iterable" error while processing the byte-pair encoding.
@mattc95 I have not encountered such an error. It seems that the spacy is not compatiable with the pytorch_pretrained_bert
. My spacy version is 2.1.0
, you could check it :)
@mattc95 Hi bro, have you solved the problem?
@mattc95 Hi bro, have you solved the problem?
` _ pytorch_pretrained_bert/tokenization_openai.py:235: in tokenize split_tokens.extend([t for t in self.bpe(token).split(' ')])
self = <pytorch_pretrained_bert.tokenization_openai.OpenAIGPTTokenizer object at 0x7fb599843b00>, token = lower
def bpe(self, token):
word = tuple(token[:-1]) + (token[-1] + '</w>',)
E TypeError: 'spacy.tokens.token.Token' object is not subscriptable
pytorch_pretrained_bert/tokenization_openai.py:177: TypeError `
I have tried to install "https://github.com/SivilTaram/transformers" on different servers and virtual environments, but they all failed two tests, one is test_default, and the other is test_full_tokenizer.
The procedures I made:
Also, I have tried Python 3.5.6 with torch 1.5 or 1.1 or 0.4. Python 3.7.3 with torch 1.5 or 1.4 or 1.1. Python 3.6.0 with torch 1.5 or 1.1. and with spacy 2.2.4 or 2.1.0
I don't think it's a version or server issue, probably I made a mistake?
I really appreciate your help!!!
@mattc95 I will try to reproduce the issue on a clean server, please stay tuned~
@SivilTaram Cool, looking forward to it.
@mattc95 Please follow the README here : https://github.com/SivilTaram/Persona-Dialogue-Generation#install-custom-dependencies
Thank you so much for your help.
I followed the instruction and fixed the previous issue.
Unfortunately, there is another error popped up that I could not solve.
[ Saving tensorboard logs here: ./tmp/transmitter/tensorboard ] Traceback (most recent call last): File "train_transmitter.py", line 145, in TrainLoop(opt).train() File "/home/cmy/code/Persona-Dialogue-Generation/parlai/scripts/train_model.py", line 512, in train world.parley() File "/home/cmy/code/Persona-Dialogue-Generation/parlai/core/worlds.py", line 671, in parley obs = self.batch_observe(other_index, batch_act, agent_idx) File "/home/cmy/code/Persona-Dialogue-Generation/parlai/core/worlds.py", line 623, in batch_observe observation = agents[index].observe(observation) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/transmitter.py", line 654, in observe dict=self.dict) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/utils.py", line 346, in maintain_dialog_history parse_vec = parse(persona_text, split_sentence) File "/home/cmy/code/Persona-Dialogue-Generation/agents/transmitter/utils.py", line 315, in parse vec = _dict.txt2vec(txt) File "/home/cmy/code/Persona-Dialogue-Generation/agents/common/gpt_dictionary.py", line 86, in txt2vec tokens = self.tokenizer.tokenize(text) File "/home/cmy/anaconda3/envs/persona/lib/python3.7/site-packages/pytorch_pretrained_bert-0.6.2-py3.7.egg/pytorch_pretrained_bert/tokenization_openai.py", line 235, in tokenize File "/home/cmy/anaconda3/envs/persona/lib/python3.7/site-packages/pytorch_pretrained_bert-0.6.2-py3.7.egg/pytorch_pretrained_bert/tokenization_openai.py", line 177, in bpe TypeError: 'spacy.tokens.token.Token' object is not subscriptable
There is a "not iterable" error while processing the byte-pair encoding.
Hi, I also encountered this problem when training the transmitter. After I uninstalled spacy, the error disappeared.
@chchenhui Many thanks for your solution. @mattc95 I found the issue, and it relates to this line in pytorch-pretrained-torch
. The original code of the lib will also do BPE on special tokens (e.g. <p1>
), so I fix it in tokenization_openai.py
to do special tokens tokenization better. However, I have tested the modified logic only on the second way in tokenization (if no ftfy
or spacy
), which will fail in the first way. I will fix my forked version to make the tokenization way determinsitc. Thanks again!
Update
Now @mattc95 you could reinstall my forked transformer
and rerun the code. Enjoy the bot ☕
@chchenhui Thanks for your alternative solution.
@SivilTaram Yeahhh! It's working smoothly now. Thanks for your time and patient!
[ training... ] [ Saving tensorboard logs here: ./tmp/transmitter/tensorboard ] Traceback (most recent call last): File "train_transmitter.py", line 145, in
TrainLoop(opt).train()
File "/home/cmy/Persona-Dialogue-Generation/parlai/scripts/train_model.py", line 512, in train
world.parley()
File "/home/cmy/Persona-Dialogue-Generation/parlai/core/worlds.py", line 663, in parley
batch_act = self.batch_act(agent_idx, batch_observations[agent_idx])
File "/home/cmy/Persona-Dialogue-Generation/parlai/core/worlds.py", line 636, in batch_act
batch_actions = a.batch_act(batch_observation)
File "/home/cmy/Persona-Dialogue-Generation/agents/transmitter/transmitter.py", line 875, in batch_act
cand_inds, sampling_cands, is_training)
File "/home/cmy/Persona-Dialogue-Generation/agents/transmitter/transmitter.py", line 686, in predict
valid_cands=valid_cands)
File "/home/cmy/Persona-Dialogue-Generation/agents/transmitter/gpt/model.py", line 91, in forward
lm_logits, hidden_states = self.transformer_module(input_seq, None, dis_seq)
ValueError: too many values to unpack (expected 2)
I have encountered the above error.
self.transformer_module(input_seq, None, dis_seq) returns 10 unknown values.
Configuration: Python 3.7.3, torch 1.5.0, torchvision 0.6.0, pytorch-pretrained-bert v0.6.2
Everything is set as default.