PrimerAI / blanc

Human-free quality estimation of document summaries
MIT License
94 stars 11 forks source link

AttributeError: 'NoneType' object has no attribute 'tokenize' #13

Closed tanay2001 closed 3 years ago

tanay2001 commented 3 years ago

Unable to run the BlancTune or BlancHelp due to the following error To reproduce the error

from blanc import BlancHelp, BlancTune
document = "Jack drove his minivan to the bazaar to purchase milk and honey for his large family."
summary = "Jack bought milk and honey."
blanc_tune = BlancTune(device='cuda', inference_batch_size=24, finetune_mask_evenly=False, finetune_batch_size=24)
blanc_tune.eval_once(document, summary)

Ouput

AttributeError                            Traceback (most recent call last)
<ipython-input-22-a8659d54e616> in <module>()
      1 document = "Jack drove his minivan to the bazaar to purchase milk and honey for his large family."
      2 summary = "Jack bought milk and honey."
----> 3 blanc_tune.eval_once(document, summary)

4 frames
/usr/local/lib/python3.7/dist-packages/blanc/blanc.py in eval_once(self, doc, summary)
    100             score (float): The BLANC score for the input
    101         """
--> 102         (doc_score,) = self.eval_summaries_for_docs([doc], [[summary]])
    103         (score,) = doc_score
    104         return score

/usr/local/lib/python3.7/dist-packages/blanc/blanc.py in eval_summaries_for_docs(self, docs, doc_summaries)
    644 
    645         doc_summaries_use = [[None for s in summs] for summs in doc_summaries]
--> 646         base_outputs, base_answers = self.mask_and_infer(self.base_model, docs, doc_summaries_use)
    647 
    648         finetuned_outputs, finetuned_answers = [], []

/usr/local/lib/python3.7/dist-packages/blanc/blanc.py in mask_and_infer(self, model, docs, doc_summaries, sep)
    171             doc_inputs, doc_answers = [], []
    172             for summary in summaries:
--> 173                 summary_inputs, summary_answers = self.get_inference_inputs(doc, summary, sep)
    174                 doc_inputs.append(summary_inputs)
    175                 doc_answers.append(summary_answers)

/usr/local/lib/python3.7/dist-packages/blanc/blanc.py in get_inference_inputs(self, doc, summary, sep)
    215         doc = clean_text(doc)
    216         doc_sents = sent_tokenize(doc)
--> 217         doc_sent_tokens = [self.model_tokenizer.tokenize(sent) for sent in doc_sents]
    218 
    219         summary_sent_tokens = None

/usr/local/lib/python3.7/dist-packages/blanc/blanc.py in <listcomp>(.0)
    215         doc = clean_text(doc)
    216         doc_sents = sent_tokenize(doc)
--> 217         doc_sent_tokens = [self.model_tokenizer.tokenize(sent) for sent in doc_sents]
    218 
    219         summary_sent_tokens = None

AttributeError: 'NoneType' object has no attribute 'tokenize'

On going through the source code , I guess there is an error on line 90 in blanc.py

        if self.model_name.lower().find('albert') >= 0:
            self.model_tokenizer = AlbertTokenizer.from_pretrained(model_name)
        else:
            self.model_tokenizer = AlbertTokenizer.from_pretrained(model_name) ## possible fix some other name needs to be given here

possible options for model_name are all BERT based , while the tokenizer being used is albert hence it returns None type

Thanks for your help

OlegVasilyev4096 commented 3 years ago

Thank you!