Flatten scores in inference forward

The transformers pipeline runs each input sequentially with batch size 1, and in the post-processing step when using top_k, it requires a 1D tensor to iterate over. This PR flattens the output of the model if it's batch size one. I've tested this:

from fastfit import FastFit
from transformers import AutoTokenizer, pipeline

model = FastFit.from_pretrained("../fast-fit")
tokenizer = AutoTokenizer.from_pretrained("roberta-large")
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer,)

# This worked before and still works
print(classifier("I love this package!"))

# This failed before
print(classifier(["When do you think my card will arrive in Sweden?", "Give me back my money!"], top_k=10))

# This still works
x = tokenizer(["Hi", "Hello"], return_tensors="pt")
print(model(x["input_ids"], x["attention_mask"]))

# This will be different
x = tokenizer(["Hi"], return_tensors="pt")
print(model(x["input_ids"], x["attention_mask"]))  # This is a 1D tensor now

I wasn't able to find another implementation of a pipeline-compatible model that I could piggyback off of. Do you see any potential problems with changing the shape of the output for single size batches? Am I missing an obvious solution?

This is the problematic code:

# From transformers/pipelines/text_classification.py in TextClassificationPipeline.postprocess

# score needs to be 1D
dict_scores = [
    {"label": self.model.config.id2label[i], "score": score.item()} for i, score in enumerate(scores)  
]

Closes #4

IBM / fastfit

Flatten scores in inference forward #5