dmmiller612 / bert-extractive-summarizer

Easy to use extractive text summarization with BERT
MIT License
1.38k stars 305 forks source link

How can we use this model for other languages like German, French and many more? #54

Closed pratikghanwat7 closed 4 years ago

pratikghanwat7 commented 4 years ago

I want to use this model for multiple languages, How can I achieve that in one code?

dmmiller612 commented 4 years ago

Yep, underneath, this uses the hugging face transformers library. So you will have access to all of the pretrained models there.

from summarizer import Summarizer
from transformers import *

d_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-multilingual-cased')
d_model = DistilBertModel.from_pretrained('distilbert-base-multilingual-cased')

model = Summarizer(custom_model=d_model, custom_tokenizer=d_tokenizer)
bernardoleite commented 3 years ago

Hey there!

I am trying to use these pre-trained models from hugging face by applying this code:

from summarizer import Summarizer
from transformers import *

f = open("mytext.txt","r")
full_text = f.read()

tokenizer_pt = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased')
model_pt = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased')

model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt)
result = model(full_text)

I am getting the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-40-b753d3c7cf8c> in <module>()
      4 model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt)
      5 
----> 6 result = model(full_text)

7 frames
/usr/local/lib/python3.6/dist-packages/summarizer/bert_parent.py in extract_embeddings(self, text, hidden, reduce_option, hidden_concat)
    112 
    113         elif type(hidden) == int:
--> 114             hidden_s = hidden_states[hidden]
    115             return self._pooled_handler(hidden_s, reduce_option)
    116 

What am I doing wrong? Thanks in advance.

macfly1202 commented 3 years ago

Are you using transformers 2.2 ?

Le jeu. 4 févr. 2021 à 16:34, bernardoleite notifications@github.com a écrit :

Hey there!

I am trying to use these pre-trained models https://huggingface.co/neuralmind/bert-base-portuguese-cased from hugging face by applying this code:

from summarizer import Summarizer from transformers import *

f = open("mytext.txt","r") full_text = f.read()

tokenizer_pt = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased') model_pt = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased')

model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) result = model(full_text)

I am getting the following error:


IndexError Traceback (most recent call last)

in () 4 model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) 5 ----> 6 result = model(full_text) 7 frames /usr/local/lib/python3.6/dist-packages/summarizer/bert_parent.py in extract_embeddings(self, text, hidden, reduce_option, hidden_concat) 112 113 elif type(hidden) == int: --> 114 hidden_s = hidden_states[hidden] 115 return self._pooled_handler(hidden_s, reduce_option) 116 What am I doing wrong? Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe . -- Jean-Marie PRIGENT
bernardoleite commented 3 years ago

How can I verify the version of transformers? I am using it on google colab

Are you using transformers 2.2 ? Le jeu. 4 févr. 2021 à 16:34, bernardoleite notifications@github.com a écrit : Hey there! I am trying to use these pre-trained models https://huggingface.co/neuralmind/bert-base-portuguese-cased from hugging face by applying this code: from summarizer import Summarizer from transformers import * f = open("mytext.txt","r") full_text = f.read() tokenizer_pt = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased') model_pt = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased') model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) result = model(full_text) I am getting the following error: --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in () 4 model = Summarizer(custom_model=model_pt, custom_tokenizer=tokenizer_pt) 5 ----> 6 result = model(full_text) 7 frames /usr/local/lib/python3.6/dist-packages/summarizer/bert_parent.py in extract_embeddings(self, text, hidden, reduce_option, hidden_concat) 112 113 elif type(hidden) == int: --> 114 hidden_s = hidden_states[hidden] 115 return self._pooled_handler(hidden_s, reduce_option) 116 What am I doing wrong? Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#54 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQXPLR4T72LSUTC7KR6IH3S5K47PANCNFSM4NGCSDSQ . -- Jean-Marie PRIGENT