brubsby / SolarPanelDataWrangler

GNU General Public License v3.0
21 stars 3 forks source link

Optimize run_inference preprocessing/inference pipeline #12

Open typicalTYLER opened 5 years ago

typicalTYLER commented 5 years ago

Currently this is the slowest part of the process, taking up the majority of the time per tile. However CPU utilization bounces around and never reaches 100% for all cores, and gpu utilization goes between 30% and 1%.

A couple idea for areas of improvement:

typicalTYLER commented 5 years ago

RE multiprocessing for image querying/stitching/resizing, I was under the impression that sqlite was completely non concurrent, but apparently it is to some extent, I think enough to where we could create separate worker threads for querying and splitting, stitching, and resizing. We just might have to change the sqlite timeout setting, and verify that it's safe to do with sqlalchemy.

In terms of simplicity, the easiest process to write would probably be querying images and splitting, as it can be done in advance without too much change to the code. Since this step starts with querying slippy_tiles that don't have images yet, and ends with saving them to the disk and updating slippy_tiles.

It might be a little harder to come up with a parallel solution to the next steps in the preprocessing, stitching and resizing, as it's all done in memory and doesn't have the disk as a checkpoint.