dramatically improve citation mentions predictor perf

We were leaving the label predictions tensors on the GPU, which was leading to lots of expensive GPU calls to read the data. In particular, tt profile revealed the following:

Screen Shot 2023-07-19 at 11 41 06 AM

Scalene's GPU time reporting is generally full of false attribution, but it was a hint in the right direction in this case. Based on my interpretation of the code, we were being forced to access this tensor off the GPU three times for every input word in the document.

After pulling the label preds into system memory:

Screen Shot 2023-07-19 at 11 44 24 AM

I confirmed the remaining high GPU items reported by the profiler have nothing to do with the GPU, and the code is too scary to futz about with anyway.

allenai / mmda

dramatically improve citation mentions predictor perf #269