Closed aaalexlit closed 1 year ago
Hi,
Thanks for the issue, and sorry for the slow response - I graduated in February and have been taking a bit of a break.
I skimmed the notebook and as far as I can tell the code looks reasonable. I haven't ever tried running inference on CPU. For GPU, when I invoke bash script/predict.sh scifact
to run inference on SciFact, the model runs inference on 300 claims, each with 10 abstracts - so, 3,000 total instances. This takes roughly 7 minutes, which means it's doing about 7 instances / second. It sounds like you're getting something like 0.5 instances / second on GPU?
That does seem weirdly slow. Maybe start by running inference exactly as I specify in the README on the SciFact dataset and see what kind of time you're getting? If your times match mine, then maybe there's something going on with your data. Otherwise, maybe it's just due to the speed of the GPU you're using?
Hi David! Thanks a lot for answering and no worries, you can see I'm not fast in responding either... Congrats, that's a huge step!
I'll check the time of inference on the original dataset, thanks for the hint! it might be the GPU, I'm using GPU that Colab provides with free tier.
without batching it took 13 minutes to make a prediction on scifact, so almost twice longer than it takes you to run it
And using batches of size 10 it surprisingly took more time - almost 16 minutes
GPU that I got this time is Tesla T4 with 15 GB RAM, I tried bigger batches and it was failing with out of memory
And the times that I see are approximately 3-4 it/s which is about twice slower than on your hardware, so looks like it's a hardware limitation in the end.
Unless you have any additional thoughts or suggestions, I'm ready to close the issue
Thanks again!
Unfortunately I don't really have any other ideas, it does seem like it's probably hardware. Feel free to close this.
Hi David! First of all, thanks a lot for sharing all your code and data. I have a couple of questions. I'm trying make inferences using my own data with your models and instructions. I think it kinda works but I still have some doubts especially about the time it takes Is it expected that inference takes on a very small dataset (3 claims and 10 abstracts for each one) takes about 6 minutes on CPU?
or 80 seconds on GPU? And that's on the second and subsequent runs of
predict.py
, on the first run it takes even longer cause it's downloading something (tokenizer? data?). I guess I can debug it to see what's being downloaded (I haven't done it yet), but probably you'd know it right away by the looks of the size (1.74G): My doubt here is, are these downloads needed for the inference.And overall if I'm using it correctly Here's exactly what I'm doing and my dummy data: https://colab.research.google.com/drive/1dbY6ybcfezhqgVfN0CfUG_uJAkcU1IOV?usp=sharing
Thanks!