Error running Pipeline with BasicReferenceRecognizer #60

Open xesaad opened 2 years ago

xesaad commented 2 years ago

Hi there! I am a new and frequent user of this great package, which also comes with a few inevitable GitHub issues 😅

When I initialize the pipeline as follows:

name = "absa/classifier-rest-0.2"
model = absa.BertABSClassifier.from_pretrained(name)
tokenizer = BertTokenizer.from_pretrained(name)
reference_recognizer = absa.aux_models.BasicReferenceRecognizer()
professor = absa.Professor(reference_recognizer) 
nlp = absa.Pipeline(model=model, tokenizer=tokenizer, professor=professor)

I receive the following error:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_514/ in <module>
      2 model = absa.BertABSClassifier.from_pretrained(name)
      3 tokenizer = BertTokenizer.from_pretrained(name)
----> 4 reference_recognizer = absa.aux_models.BasicReferenceRecognizer()
      5 professor = absa.Professor(reference_recognizer)
      6 nlp = absa.Pipeline(model=model, tokenizer=tokenizer, professor=professor)

TypeError: __init__() missing 1 required positional argument: 'weights'

I realise this is because the BasicReferenceRecognizer needs to be trained in order to select weights. This leads me to two questions/issues:

  1. The BasicReferenceRecognizer class has no train method. Is there another way in which to train it, or any ways to load a pretrained model from the package? From the unit tests for the BasicReferenceRecognizer I found there were two pre-trained models, 'absa/basic_reference_recognizer-rest-0.1' and 'absa/basic_reference_recognizer-lapt-0.1', but on trying to initialize with these I received an ImportError.
  2. I also tried directly initializing the BasicReferenceRecognizer with weights=(-0.025, 44) as is done in this line. However, upon making predictions I get an error in the Pipeline at the postprocess step:
    TypeError                                 Traceback (most recent call last)
    /tmp/ipykernel_514/ in <module>
      3 for row in df.itertuples():
      4     print(row)
    ----> 5     prediction = predict(row.Review, row.Aspect)
      6     sentiment = get_sentiment(prediction)
      7     certainty_score = get_certainty_score(prediction)

/tmp/ipykernel_514/ in predict(text, aspect) 16 output_batch = nlp.predict(input_batch) 17 predictions =, output_batch) ---> 18 completed_task = nlp.postprocess(task, predictions) 19 completed_subtask = completed_task.subtasks[aspect] 20 return completed_subtask

/pyenv/versions/3.8.5/envs/seo-advice-page/lib/python3.8/site-packages/aspect_based_sentiment_analysis/ in postprocess(task, batch_examples) 301 aspect, = {e.aspect for e in examples} 302 scores = np.max([e.scores for e in examples], axis=0) --> 303 scores /= np.linalg.norm(scores, ord=1) 304 sentiment_id = np.argmax(scores).astype(int) 305 aspect_document = CompletedSubTask(

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

I believe that this error is related to a `TypeError` between `int` and `float`. If instead I initialize with `weights = (1,1)`, for example, I receive no error.

I wanted to flag these issues for your awareness. Thank you very much for any advice you can provide 😄 
xesaad commented 2 years ago

Update: I believe that this issue is due to the following: if the BasicReferenceRecognizer does not detect an aspect, the professor component sets scores = [0,0,0], which is a list of integers. When scores is then normalised by dividing by its norm, the error is raised because you are dividing an int when you really want to divide a float (of course, there may also be a ZeroDivisionError lurking here!)


I tried to open a PR to fix these suggestions myself, but unfortunately I don't have permission to push to this repository. I hope that these suggestions help with resolving this issue!

abaveja313 commented 1 year ago

Thank you for the suggestion! Fixed my problem @xesaad