allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

dramatically improve citation mentions predictor perf #269

Closed cmwilhelm closed 11 months ago

cmwilhelm commented 11 months ago

We were leaving the label predictions tensors on the GPU, which was leading to lots of expensive GPU calls to read the data. In particular, tt profile revealed the following:

Screen Shot 2023-07-19 at 11 41 06 AM

Scalene's GPU time reporting is generally full of false attribution, but it was a hint in the right direction in this case. Based on my interpretation of the code, we were being forced to access this tensor off the GPU three times for every input word in the document.

After pulling the label preds into system memory:

Screen Shot 2023-07-19 at 11 44 24 AM

I confirmed the remaining high GPU items reported by the profiler have nothing to do with the GPU, and the code is too scary to futz about with anyway.