geodesign / django-raster

Django-raster allows you to create tiled map services (TMS) and raster map algebra end points for web maps. It is Python-based, and requires GeoDjango with a PostGIS backend.
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

multi task processing result in multiple times creation same tmp file & multiple times processing same raster #70

Open justRishi opened 2 years ago

justRishi commented 2 years ago

Problem

If RASTER_USE_CELERY = True and (RASTER_PARSE_SINGLE_TASK = False or not set) then a temp file is created multiple times in def open_raster_file in RasterLayerParser in parser.py Also when not in the right reprojection, the projection is done multiple times.

Why problem

Big raster files are copied in my case 4 times, processed by GDAL 4 times . and sometimes (when not in the right projection) 4 times reprojected.

How tested

by adding self.log to print out tmp file creation resulting in: image

How to mitigate

put RASTER_PARSE_SINGLE_TASK = True in settings , but meaning will not use concurrency to process raster file

Possible solution to process parallel and not duplicate work

  1. check that only 1 tmp folder is created : so this line in parser.py should change self.tmpdir = tempfile.mkdtemp(dir=raster_workdir (as always unique)
  2. self.dataset in parser.py (in class RasterLayerParser) should be shared by all parallel tasks for same raster file