cmosig / sentle

Sentinel-1 & Sentinel-2 data cubes at large scale (bigger-than-memory) on any machine with integrated cloud detection, snow masking, harmonization, merging, and temporal composites.
MIT License
22 stars 2 forks source link

identify bottleneck in GPU processing and if it even is the bottleneck #32

Open cmosig opened 1 week ago

cmosig commented 1 week ago

During processing the GPU is not fully utilized, even with 100 workers sending jobs to the cloud detection service. GPU was only at ~100W/300W, where in basic experiments I was able to fully utilize it with the same model and tensor size. Experiments from #11 have shown that batching does not increase processing speed because of the size of each tile.

Question is: 1) is the GPU cloud detection a bottleneck and could we gain some performance here? Or is the service already pushing through patches as fast as possible? 2) If so what is the issue here? Is it that the process has to wait for transform from/to CPU?