identify bottleneck in GPU processing and if it even is the bottleneck

During processing the GPU is not fully utilized, even with 100 workers sending jobs to the cloud detection service. GPU was only at ~100W/300W, where in basic experiments I was able to fully utilize it with the same model and tensor size. Experiments from #11 have shown that batching does not increase processing speed because of the size of each tile.

Question is: 1) is the GPU cloud detection a bottleneck and could we gain some performance here? Or is the service already pushing through patches as fast as possible? 2) If so what is the issue here? Is it that the process has to wait for transform from/to CPU?

cmosig / sentle

identify bottleneck in GPU processing and if it even is the bottleneck #32