ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.07k stars 726 forks source link

How to interpret raw_outputs from NER model #426

Closed ColinFerguson closed 4 years ago

ColinFerguson commented 4 years ago

Hello and thank you for your work on this repository. I am having trouble making sense of the raw model outputs returned by the NER model.

I have trained an NER model with three custom labels: model = NERModel('bert', 'bert-base-cased', labels=['O', 'B-clinical_trial', 'I-clinical_trial'], use_cuda=True)

Training works just fine and the predictions are fine as well. But I would like to use probabilities from the raw outputs (as I do in text classification). In particular, I would like a probability for each label for each input token.

Question 1: sample_text = 'HER2CLIMB is the first randomized pivotal trial completed to enroll patients with metastatic HER2 positive breast cancer who have untreated or previously treated and progressing brain metastases.' preds, outputs = model.predict([sample_text]) outputs = softmax(outputs, axis=1) print(len(preds[0])) print(outputs[0].shape)

The code above gives 27 predictions (preds) and 128 outputs (shape=(128,3)). Can you explain how to interpret the 'extra' outputs?

Question 2: I was expecting that, after running the softmax, I could interpret the value in each column of outputs as the probability that the token has the given label. And that each row with a max in column 0 would be given the same label, each with a max in column 1 would be given the same label, and the same for column 2. But that does not appear to be the case

pred argmax(row)
{'HER2CLIMB': 'B-clinical_trial'} 0
{'is': 'O'} 1
{'the': 'O'} 2
{'first': 'O'} 2
{'randomized': 'O'} 2
{'pivotal': 'O'} 2
{'trial': 'O'} 2
{'completed': 'O'} 0
{'to': 'O'} 0
{'enroll': 'O'} 0
{'patients': 'O'} 0
{'with': 'O'} 0
{'metastatic': 'O'} 0
{'HER2': 'O'} 0
{'positive': 'O'} 0
{'breast': 'O'} 0
{'cancer': 'O'} 0
{'who': 'O'} 0
{'have': 'O'} 0
{'untreated': 'O'} 0
{'or': 'O'} 0
{'previously': 'O'} 0
{'treated': 'O'} 0
{'and': 'O'} 0
{'progressing': 'O'} 0
{'brain': 'O'} 0
{'metastases.': 'O'} 0

Notices that sometimes a token is assigned 'O' (correctly, I might add) and the max probability is in column 0, sometimes column 1, sometimes column 2.

Any help you can give me interpreting these values will be very much appreciated. Thank you for your work on this project.

ThilinaRajapakse commented 4 years ago

I think both these things happen because of the same thing. The length of the raw model outputs for each word will depend on the tokenized length of the word. The softmax is computed over the mean of the model outputs for each token. I.e. the model output for an actual word is the mean of the outputs for each of its constituent tokens.

This example might shed some light as well. (Note the np.mean() before softmax in line 64)

ColinFerguson commented 4 years ago

Hi Thilina, thank you for the response. I worked with that example yesterday and I think there may be an error. Lines 62 and 63 are

key = list(pred.keys())[0] new_out = out[key]

I believe this is an error because key is a string in this case, but out is an array, not a dict, so out[key] won't work. I tried to figure out the index of out that key might represent, but I couldn't come up with anything.

From your example, the first key is Some, but the corresponding out is [ 3.7382812 -1.6835938 -1.6152344], so out[key] will give an error.

Thanks again for any help you can give me.

ThilinaRajapakse commented 4 years ago

Did you run the script? It runs without any errors. out is a dict that is being generated from a nested for loop (raw_outputs -> outs -> out).

ColinFerguson commented 4 years ago

Upgrading to 0.30.0 solved the problem. Thank you ThilinaRajapakse.