Closed MorenoLaQuatra closed 3 years ago
I'll take a look.
Experiencing the same thing
Whoops, looks like I fixed one part, but need to fix the summarizer contract. I will get to that this weekend.
How do i use this with docker? Trying the german-bert from here
docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
but get
root@docker2:~/bert-extractive-summarizer/summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 19742925.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
File "./server.py", line 86, in <module>
summarizer = Summarizer(args.model, int(args.hidden), args.reduce, float(args.greediness))
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 73, in __init__
super(Summarizer, self).__init__(model, hidden, reduce_option, greedyness)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 53, in __init__
super(SingleModel, self).__init__(model, hidden, reduce_option, greedyness)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 15, in __init__
self.model = BertParent(model)
File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 41, in __init__
self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker2:~/bert-extractive-summarizer/summarizer#
@davidlenz , Sorry, I wasn't using Docker when checking in the last changes for this issue, so didn't look at that setup;
Making this work would require some more code updates I think. server.py
and summarize.py
would need to be updated to accept arguments to pass in e.g. the path to where your custom model is stored, plus some code updates to create a BertModel (and BertTokenizer if required) from those paths, which can then be passed into the Summarizer(...) constructor.
@hdatteln this is a good starting point, thanks! Got the following afterwards:
Traceback (most recent call last):
File "./server.py", line 87, in <module>
summarizer = Summarizer(args.model, args.custom_model, args.custom_tokenizer, int(args.hidden), args.reduce, float(args.greediness))
TypeError: __init__() takes from 1 to 5 positional arguments but 7 were given
root@docker2:~/bert-extractive-summarizer#
So from the requirements-service.txt
here it looks the bert-extractive-summarizer
is installed via pip as version 0.2.0
which needs to be changed to reflect the latest changes in version 0.2.2
.
I applied the changes locally and rebuild the docker container (docker build uses local server.py
and requirements-service.txt
) but had no luck. I am actually uncertain how to correctly provide inputs to custom_model
and custom_tokenizer
.
Staring at the code for a while, i came to the conclusion that my model is not really a custom model in the sense it is meant to be here, but rather another pretrained model already in the transformers repo. Thus i concluded it would suffice to include the bert-base-german-cased
into the MODELS dict from BertParent.py
. However as i currently understand these changes need to be added to pypi as well to be usable with docker.
Sorry, I actually fixed this last night, and forgot to commit. I will update when I get home this evening.
Thanks for the Feedback! Unfortunately it is still not working for me and i am not sure how to go on or correctly use the german-bert
.
docker run --rm -it -p 5000:5000 summary-service:latest -model bert-large-uncased
works well, but
docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
still throws AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker:~/bert-extractive-summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 17176330.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
File "./server.py", line 90, in <module>
greedyness=float(args.greediness)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__
super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__
greedyness, language=language, random_state=random_state)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__
self.model = BertParent(model, custom_model, custom_tokenizer)
File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__
self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker:~/bert-extractive-summarizer#
Is it also possible to add the multilingual options from BERT as an option? (or make it possible to custom indicate which BERT tokenizer, model and which pre-trained model it needs to use?)
Update: I found out it is already possible, but the documentation leaves some room for interpretation (as in that the custom model needs to be already pre-trained). Maybe it is possible to include the following passage for others to see how they can use it? @dmmiller612
bert_model = "bert-base-multilingual-cased"
custom_model = transformers.BertModel.from_pretrained(bert_model, output_hidden_states=True)
custom_tokenizer = transformers.BertTokenizer.from_pretrained(bert_model)
model = Summarizer(model=bert_model, custom_model=custom_model, custom_tokenizer=custom_tokenizer)```
Yep, I can update the documentation.
I am having the same issue
$ docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-multilingual-cased [nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. 100%|################################################################################################################################################################| 40155833/40155833 [00:25<00:00, 1561989.72B/s] Using Model: bert-base-multilingual-cased Traceback (most recent call last): File "./server.py", line 90, in <module> greedyness=float(args.greediness) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__ super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__ greedyness, language=language, random_state=random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__ self.model = BertParent(model, custom_model, custom_tokenizer) File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'
Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.
I am having the same issue:
I am trying to load trained model using:
ext_model = Summarizer(model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt")
I also tried to use
ext_model = Summarizer(custom_model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt")
However, I have the following error:
File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'
There's an ad-hoc solution if it's urgent.
Replace
`
base_model, base_tokenizer = self.MODELS.get(model, (None, None))
if custom_model:
self.model = custom_model
else:
self.model = base_model.from_pretrained(model, output_hidden_states=True)
if custom_tokenizer:
self.tokenizer = custom_tokenizer
else:
self.tokenizer = base_tokenizer.from_pretrained(model)`
with
`
base_model, base_tokenizer = self.MODELS.get('bert-large-uncased', (None, None))
if custom_model:
self.model = base_model.from_pretrained(custom_model, output_hidden_states=True)
else:
self.model = base_model.from_pretrained(model, output_hidden_states=True)
if custom_tokenizer:
self.tokenizer = base_tokenizer.from_pretrained(custom_tokenizer)
else:
self.tokenizer = base_tokenizer.from_pretrained(model)`
this part in bert_parent.py to make it work. Use with caution, since it's not a permanent solution. You can use new Summarizer(custom model = 'path_or_model', custom_tokenizer = 'path_or_model') now.
Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.
Hi, any update on this loading the custom model
You should be able to load a custom (Transformers based) model using the library. Here is an example from the readme, let me know if it you are still having issues.
from transformers import *
# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('allenai/scibert_scivocab_uncased')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
custom_model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased', config=custom_config)
from summarizer import Summarizer
body = 'Text body that you want to summarize with BERT'
body2 = 'Something else you want to summarize with BERT'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)
model(body2)
Closing as stale. Let me know if any issues arise here.
When trying to use a custom model specifying the path, the class throws an error at the line:
/summarizer/BertParent.py, line 41
. The error isAttributeError: 'NoneType' object has no attribute 'from_pretrained'
.I think that the line:
base_model, base_tokenizer = self.MODELS.get(model, (None, None))
should become something like:
base_model, base_tokenizer = self.MODELS.get(model, (model, model))
when model specify the path to the custom bert model.