dmmiller612 / bert-extractive-summarizer

Easy to use extractive text summarization with BERT
MIT License
1.39k stars 305 forks source link

Error when using custom BERT models. #17

Closed MorenoLaQuatra closed 3 years ago

MorenoLaQuatra commented 4 years ago

When trying to use a custom model specifying the path, the class throws an error at the line: /summarizer/BertParent.py, line 41. The error is AttributeError: 'NoneType' object has no attribute 'from_pretrained'.

I think that the line:

base_model, base_tokenizer = self.MODELS.get(model, (None, None))

should become something like:

base_model, base_tokenizer = self.MODELS.get(model, (model, model))

when model specify the path to the custom bert model.

dmmiller612 commented 4 years ago

I'll take a look.

hdatteln commented 4 years ago

Experiencing the same thing

dmmiller612 commented 4 years ago

Whoops, looks like I fixed one part, but need to fix the summarizer contract. I will get to that this weekend.

davidlenz commented 4 years ago

How do i use this with docker? Trying the german-bert from here

docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased

but get

root@docker2:~/bert-extractive-summarizer/summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 19742925.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
  File "./server.py", line 86, in <module>
    summarizer = Summarizer(args.model, int(args.hidden), args.reduce, float(args.greediness))
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 73, in __init__
    super(Summarizer, self).__init__(model, hidden, reduce_option, greedyness)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 53, in __init__
    super(SingleModel, self).__init__(model, hidden, reduce_option, greedyness)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 15, in __init__
    self.model = BertParent(model)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 41, in __init__
    self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker2:~/bert-extractive-summarizer/summarizer#
hdatteln commented 4 years ago

@davidlenz , Sorry, I wasn't using Docker when checking in the last changes for this issue, so didn't look at that setup; Making this work would require some more code updates I think. server.py and summarize.py would need to be updated to accept arguments to pass in e.g. the path to where your custom model is stored, plus some code updates to create a BertModel (and BertTokenizer if required) from those paths, which can then be passed into the Summarizer(...) constructor.

davidlenz commented 4 years ago

@hdatteln this is a good starting point, thanks! Got the following afterwards:

Traceback (most recent call last):
  File "./server.py", line 87, in <module>
    summarizer = Summarizer(args.model, args.custom_model, args.custom_tokenizer, int(args.hidden), args.reduce, float(args.greediness))
TypeError: __init__() takes from 1 to 5 positional arguments but 7 were given
root@docker2:~/bert-extractive-summarizer#

So from the requirements-service.txt here it looks the bert-extractive-summarizer is installed via pip as version 0.2.0 which needs to be changed to reflect the latest changes in version 0.2.2.

I applied the changes locally and rebuild the docker container (docker build uses local server.py and requirements-service.txt) but had no luck. I am actually uncertain how to correctly provide inputs to custom_model and custom_tokenizer.

Staring at the code for a while, i came to the conclusion that my model is not really a custom model in the sense it is meant to be here, but rather another pretrained model already in the transformers repo. Thus i concluded it would suffice to include the bert-base-german-cased into the MODELS dict from BertParent.py. However as i currently understand these changes need to be added to pypi as well to be usable with docker.

dmmiller612 commented 4 years ago

Sorry, I actually fixed this last night, and forgot to commit. I will update when I get home this evening.

davidlenz commented 4 years ago

Thanks for the Feedback! Unfortunately it is still not working for me and i am not sure how to go on or correctly use the german-bert.

docker run --rm -it -p 5000:5000 summary-service:latest -model bert-large-uncased

works well, but

docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased

still throws AttributeError: 'NoneType' object has no attribute 'from_pretrained'


root@docker:~/bert-extractive-summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 17176330.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
  File "./server.py", line 90, in <module>
    greedyness=float(args.greediness)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__
    super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__
    greedyness, language=language, random_state=random_state)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__
    self.model = BertParent(model, custom_model, custom_tokenizer)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__
    self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker:~/bert-extractive-summarizer#
houda96 commented 4 years ago

Is it also possible to add the multilingual options from BERT as an option? (or make it possible to custom indicate which BERT tokenizer, model and which pre-trained model it needs to use?)

Update: I found out it is already possible, but the documentation leaves some room for interpretation (as in that the custom model needs to be already pre-trained). Maybe it is possible to include the following passage for others to see how they can use it? @dmmiller612


bert_model = "bert-base-multilingual-cased"
custom_model = transformers.BertModel.from_pretrained(bert_model,  output_hidden_states=True)
custom_tokenizer = transformers.BertTokenizer.from_pretrained(bert_model)
model = Summarizer(model=bert_model, custom_model=custom_model, custom_tokenizer=custom_tokenizer)```
dmmiller612 commented 4 years ago

Yep, I can update the documentation.

elmeligy commented 4 years ago

I am having the same issue $ docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-multilingual-cased [nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. 100%|################################################################################################################################################################| 40155833/40155833 [00:25<00:00, 1561989.72B/s] Using Model: bert-base-multilingual-cased Traceback (most recent call last): File "./server.py", line 90, in <module> greedyness=float(args.greediness) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__ super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__ greedyness, language=language, random_state=random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__ self.model = BertParent(model, custom_model, custom_tokenizer) File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'

dmmiller612 commented 4 years ago

Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.

igormis commented 4 years ago

I am having the same issue: I am trying to load trained model using: ext_model = Summarizer(model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt") I also tried to use ext_model = Summarizer(custom_model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt") However, I have the following error: File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'

ghost commented 4 years ago

There's an ad-hoc solution if it's urgent.

Replace

`
base_model, base_tokenizer = self.MODELS.get(model, (None, None))

    if custom_model:
        self.model = custom_model
    else:
        self.model = base_model.from_pretrained(model, output_hidden_states=True)

    if custom_tokenizer:
        self.tokenizer = custom_tokenizer
    else:
        self.tokenizer = base_tokenizer.from_pretrained(model)`

with

`
base_model, base_tokenizer = self.MODELS.get('bert-large-uncased', (None, None))

    if custom_model:
        self.model = base_model.from_pretrained(custom_model, output_hidden_states=True)
    else:
        self.model = base_model.from_pretrained(model, output_hidden_states=True)

    if custom_tokenizer:
        self.tokenizer = base_tokenizer.from_pretrained(custom_tokenizer)
    else:
        self.tokenizer = base_tokenizer.from_pretrained(model)`

this part in bert_parent.py to make it work. Use with caution, since it's not a permanent solution. You can use new Summarizer(custom model = 'path_or_model', custom_tokenizer = 'path_or_model') now.

nvenkatesh2409 commented 3 years ago

Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.

Hi, any update on this loading the custom model

dmmiller612 commented 3 years ago

You should be able to load a custom (Transformers based) model using the library. Here is an example from the readme, let me know if it you are still having issues.

from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('allenai/scibert_scivocab_uncased')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
custom_model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased', config=custom_config)

from summarizer import Summarizer

body = 'Text body that you want to summarize with BERT'
body2 = 'Something else you want to summarize with BERT'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)
model(body2)
dmmiller612 commented 3 years ago

Closing as stale. Let me know if any issues arise here.