Get the last output of the model 'cl-tohoku/bert-base-japanese-char-whole-word-masking'

cl-tohoku / bert-japanese

BERT models for Japanese text.

Apache License 2.0

514 stars 55 forks source link

Get the last output of the model 'cl-tohoku/bert-base-japanese-char-whole-word-masking' #10

Closed demdecuong closed 4 years ago

demdecuong commented 4 years ago

The tokenizer is good for japanese but I want to get the last output layer of the model above. Since I am following the instruction in the huggingface that:

tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-whole-word-masking") model = AutoModelWithLMHead.from_pretrained("cl-tohoku/bert-base-japanese-whole-word-masking") input_ids = torch.tensor(tokenizer.encode(text, add_special_tokens=True)).unsqueeze(0) # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0]

Then i got len(outputs) = 1, The expected last_hidden_states shape is (batch,seq len, dmodel) but i got (batch,seq len, vocab size).

How can i get the shape (batch,seq len, dmodel) in of your model.

singletongue commented 4 years ago

Could you please try again with AutoModel instead of AutoModelWithLMHead?

demdecuong commented 4 years ago

Thank you very much . It works properly. I am trying to use your vocabulary to fit my dataloader in pytorch. Do you have any idea ? I am figure out all the stuffs in the internet but it not fit with me.

singletongue commented 4 years ago

I'm afraid that I'm not sure what you would like to accomplish, but your issue seems to be related to how to use the Transformers framework rather than a Japanese-specific problem. You may want to refer to Transformers resources such as https://github.com/huggingface/transformers/tree/master/notebooks. (Unfortunately, I cannot afford to demonstrate all the details of the Transformers framework.)

demdecuong commented 4 years ago

I want to use the last output of bert-japanese as input of my model. To make it works, my word embedding must fit the index of the bert vocabulary because i am using enc-dec architect. eg : bert-char has 4000 tokens, my vocab must have 4000 tokens and the index is the same. Do i understand correctly ?