NicoMandel / asdc_mwe

minimal working example of how to run code on asdc
0 stars 1 forks source link

Speed tests - 20/12/2023 #2

Open littlerob84 opened 10 months ago

littlerob84 commented 10 months ago

Speed Tests Laptop 1 (HS Laptop) GTX1080 - Torch wasn't found, so I used 'pip install torch', then re-ran and the result was False :-( . Doing some research this GPU isn't new enough and so will never work fast. I haven't run a test on this one yet. Laptop 2 (NPWS Laptop) RTX 2070 - True. But this is the laptop I ran the test on! I just restarted the laptop and re-ran and took 1 hour, 39 mins 42 secs to do 528 images, so 11.33s per iteration. I ran these on JPEGS that were converted form the ARW's and were around 20-30mbs in size. Laptop 3 (My personal one) RTX 3080 Ti - True. Ran a test on 50 and took 5 mins 52 secs, so 7.04s per iteration. I ran these on JPEGS that were converted form the ARW's and were around 30-40mbs in size (these were converted using a higher quality than on the ones I ran on laptop 2). I did notice it had "CUDA:0" at the top though? To run the bounding box script over the 39 detections and it took 1:10 so 1.79s per iteration.

On all 3 I checked power mode was at max and battery mode was highest. Nothing else was running at the time.

Worst case, we can be running these on some flights that have 2000 images per flight, over 12 flights, that will mean 46 hours per site! For 0.9cm work, we are generally doing 8-10 flights per day of around 1000 per flight, call it 9000. That's = 17.6 hours per site. This is on my "fast" laptop, on the NPWS one it equates to 28 hours per site. For the 70 sites this season, that will mean over 82 days of just processing time. (And these times dont include the conversion or bounding box time)

NicoMandel commented 10 months ago

Hi Rob thanks for that input

I've provided code for testing the time and showing what graphics card is being used in the last commit Rest I've already replied to in the email. Definitely need to find ways to accelerate this.

Accelerating by making SAHI run batches, according to this issue here appears to be the best way, however does require some intricate changes as you can see. This works best the larger the GPU memory is, because it reduces transfer times between CPU and GPU, which are the bottlenecks for batch_size=1 calculations at this stage (and also mobile GPUs as yours). So it may bring much more of an increase on my machine than yours. My GPU uses about 2GB of memory constantly per pass, so tripling the batch size may already bring a large performance increase (about half the time then).

Can explore this. Please provide an indicator of what priority (1, 2 or 3) you would like this to be, 1 being most urgent.

Cheers

NicoMandel commented 10 months ago

Just adding here that my personal laptop with a RTX 3060 GPU took 6.57s per image on an ARW folder of 4 images.

NicoMandel commented 10 months ago

Full process for 4 files with arw images on my personal laptop (RTX 3060) takes 20.80 seconds, so 5.2 seconds per image. including visualisation and storing to drive.

littlerob84 commented 9 months ago

9th Jan 2024 (Version f55bb5f0a6c811871c3fe12f7e76c979ce6be181) - Laptop 3 (My personal one) RTX 3080 Ti - 6.8-7.5 s/it.

NicoMandel commented 9 months ago

Are these speeds sufficient for you or do you want accelerations?

littlerob84 commented 9 months ago

Definitely need it to be faster, we can have upto 5 teams a day and for some sites that can be almost 30,000 images per site. So it really needs to be as fast as possible to make it practical. Mine runs at just over 1 per second, and even that is only just workable during the season.

NicoMandel commented 9 months ago

Hi Rob thanks for that. It should be possible to accelerate it by changing the dataloader to take larger batch sizes, however, I doubt it'll be to 1 second. Let me see where I can get to.