ai-forever / ner-bert

BERT-NER (nert-bert) with google bert https://github.com/google-research.
MIT License
407 stars 97 forks source link

Predict a sentence using BERTBiLSTMAttnNCRF without passing a dataloader #32

Open mhrihab opened 3 years ago

mhrihab commented 3 years ago

I had an issue while building a function that only predicts a sentence without passing a dataloader instance Theses are the steps I followed: sentence= 'put a sentence' bert_tokens = [] tok_map = [] tokenizer = BertTokenizer.from_pretrained("bert-base-multilingual-cased")

label2idx = {"[PAD]": pad_idx, '[CLS]': 1, '[SEP]': 2, "X": 3}

# idx2label = ["[PAD]", '[CLS]', '[SEP]', "X"]

orig_tokens = sentence.split() orig_tokens = ["[CLS]"] + orig_tokens + ["[SEP]"] for origin_token in orig_tokens: cur_tokens = tokenizer.tokenize(origin_token) bert_tokens.extend(cur_tokens) tok_map.append(len(bert_tokens)) input_ids = tokenizer.convert_tokens_to_ids(bert_tokens) input_mask = [1] len(input_ids) while len(input_ids) < 424: input_mask.append(0) tok_map.append(-1) input_ids.append(0) input_type_ids = [0] len(input_ids)

The problem is I couldn't figure out what batch is in order to predict using model.forward(batch) I tried this:

batch=[[0],[0],[0]] batch[0]=input_ids batch[1]=input_type_ids batch[2]=input_mask learner.model.forward(batch) and this is what I got:

~/ner-bert-master-last-version/ner-bert-master-last-version/modules/models/bertmodels.py in forward(self, batch) 46 def forward(self, batch): 47 input, labels_mask, input_type_ids = batch[:3] ---> 48 inputembeddings = self.embeddings(batch) 49 output, = self.lstm.forward(batch) 50 output, _ = self.attn(output, output, output, None)

~/.local/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, kwargs) 725 result = self._slow_forward(*input, *kwargs) 726 else: --> 727 result = self.forward(input, kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),

~/ner-bert-master-last-version/ner-bert-master-last-version/modules/layers/embedders.py in forward(self, batch) 59 token_type_ids=batch[2], 60 attention_mask=batch[1], ---> 61 output_all_encoded_layers=self.config["mode"] == "weighted") 62 if self.config["mode"] == "weighted": 63 encoded_layers = torch.stack([a * b for a, b in zip(encoded_layers, self.bert_weights)])

~/.local/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, kwargs) 725 result = self._slow_forward(*input, *kwargs) 726 else: --> 727 result = self.forward(input, kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(),

~/.local/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py in forward(self, input_ids, token_type_ids, attention_mask, output_all_encoded_layers) 718 # this attention mask is more simple than the triangular masking of causal attention 719 # used in OpenAI GPT, we just need to prepare the broadcast dimension here. --> 720 extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) 721 722 # Since attention_mask is 1.0 for positions we want to attend and 0.0 for

AttributeError: 'list' object has no attribute 'unsqueeze'

can you please help!

LearrningLukeLondon commented 1 year ago

@mhrihab, were you able to resolve this issue?