Closed colllin closed 5 years ago
Potentially we could compute()
the array before writing it to disk, in to_geotiff()
? Does that address the root cause though, or is there still a chance that Rasterio writers in parallel threads could misbehave on rare occasions? This question and proposed solution on stackoverflow might be facing a similar issue to what is described here.
@colllin Do the chips have to be geotiffs?
I'm pretty sure this has something to do with GDAL not really being threadsafe. Usually the workaround would be to spawn out processes that each write a tile at a time.
I'm out until next Tuesday - let's sync up then.
No, I don't think they really need to be geotiffs. Maybe I should just read() them and write them myself? Thanks for the idea @drwelby. I'll look into it.
The tifffile
package has utilities for reading, writing, and plotting any number of bands:
$ pip install tifffile
im = gbdxtools.CatalogImage(...)
px = im[band_idxs,...].read()
px = np.moveaxis(px, 0, -1)
tifffile.imshow(px)
tifffile.imsave('path/to/my.tif', px)
back_again = tifffile.imread('path/to/my.tif')
gbdxtools 0.16.6, python 3.7.2, ubuntu 16.04
Description
I'm attempting to parallelize the downloading of a large number of small images using
multiprocessing.pool.ThreadPool
(multiprocessing.Pool
completely freezes when usingCatalogImage().geotiff()
).For example:
Expected behavior:
I expect to see images over the requested location from among a few overlapping catalog IDs.
Actual behavior:
Sometimes the resulting geotiff is empty (all zeros/black — see
offset_300m.tif
below), and sometimes it contains image contents but not from the requested bbox (seeoffset_600m.tif
below). The behavior appears to be non-deterministic. In a quick ablation experiment, I found that:processes=10
toprocesses=1
, which performs the requests in seriesread()
the image, e.g.ci = CatalogImage(); ci.read(); ci.geotiff(...)
This is the output from one test run of the above code example: