How can we use this model for other languages like German, French and many more?

pratikghanwat7 commented 4 years ago

I want to use this model for multiple languages, How can I achieve that in one code?

dmmiller612 commented 4 years ago

Yep, underneath, this uses the hugging face transformers library. So you will have access to all of the pretrained models there.

from summarizer import Summarizer
from transformers import *

d_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-multilingual-cased')
d_model = DistilBertModel.from_pretrained('distilbert-base-multilingual-cased')

model = Summarizer(custom_model=d_model, custom_tokenizer=d_tokenizer)

bernardoleite commented 3 years ago

Hey there!

I am trying to use these pre-trained models from hugging face by applying this code:

from summarizer import Summarizer
from transformers import *

f = open("mytext.txt","r")
full_text = f.read()

tokenizer_pt = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased')
model_pt = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased')

model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt)
result = model(full_text)

I am getting the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-40-b753d3c7cf8c> in <module>()
      4 model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt)
      5 
----> 6 result = model(full_text)

7 frames
/usr/local/lib/python3.6/dist-packages/summarizer/bert_parent.py in extract_embeddings(self, text, hidden, reduce_option, hidden_concat)
    112 
    113         elif type(hidden) == int:
--> 114             hidden_s = hidden_states[hidden]
    115             return self._pooled_handler(hidden_s, reduce_option)
    116

What am I doing wrong? Thanks in advance.

macfly1202 commented 3 years ago

Are you using transformers 2.2 ?

Le jeu. 4 févr. 2021 à 16:34, bernardoleite notifications@github.com a écrit :

Hey there!

I am trying to use these pre-trained models https://huggingface.co/neuralmind/bert-base-portuguese-cased from hugging face by applying this code:

from summarizer import Summarizer from transformers import *

f = open("mytext.txt","r") full_text = f.read()

tokenizer_pt = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased') model_pt = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased')

model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) result = model(full_text)

I am getting the following error:

IndexError Traceback (most recent call last)
in () 4 model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) 5 ----> 6 result = model(full_text) 7 frames /usr/local/lib/python3.6/dist-packages/summarizer/bert_parent.py in extract_embeddings(self, text, hidden, reduce_option, hidden_concat) 112 113 elif type(hidden) == int: --> 114 hidden_s = hidden_states[hidden] 115 return self._pooled_handler(hidden_s, reduce_option) 116 What am I doing wrong? Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe . -- Jean-Marie PRIGENT

bernardoleite commented 3 years ago

How can I verify the version of transformers? I am using it on google colab

Are you using transformers 2.2 ? Le jeu. 4 févr. 2021 à 16:34, bernardoleite notifications@github.com a écrit : Hey there! I am trying to use these pre-trained models https://huggingface.co/neuralmind/bert-base-portuguese-cased from hugging face by applying this code: from summarizer import Summarizer from transformers import * f = open("mytext.txt","r") full_text = f.read() tokenizer_pt = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased') model_pt = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased') model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) result = model(full_text) I am getting the following error: --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in () 4 model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) 5 ----> 6 result = model(full_text) 7 frames /usr/local/lib/python3.6/dist-packages/summarizer/bert_parent.py in extract_embeddings(self, text, hidden, reduce_option, hidden_concat) 112 113 elif type(hidden) == int: --> 114 hidden_s = hidden_states[hidden] 115 return self._pooled_handler(hidden_s, reduce_option) 116 What am I doing wrong? Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#54 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQXPLR4T72LSUTC7KR6IH3S5K47PANCNFSM4NGCSDSQ . -- Jean-Marie PRIGENT

dmmiller612 / bert-extractive-summarizer

How can we use this model for other languages like German, French and many more? #54