DigitalGlobe / gbdxtools

(Deprecated) Python SDK for using GBDX
MIT License
74 stars 57 forks source link

Possible bug in download Geotiff with dtype not Float32 #379

Closed renaudallioux closed 6 years ago

renaudallioux commented 6 years ago

When accessing image through gbdxtool the download of rasters is always done in "float32" format, even when inputing key dtype exemple:

image.geotiff(path="toto6.tif",dtype='uint16', bands=[3,2,1], proj="EPSG:4326")

performs correctly, however

gdalinfo toto6.tif gives

Band 1 Block=6979x1 Type=Float32, ColorInterp=Gray
Band 2 Block=6979x1 Type=Float32, ColorInterp=Undefined
Band 3 Block=6979x1 Type=Float32, ColorInterp=Undefined

So not in uint16.

Am I doing something wrong or is it a #bug? Regards

drwelby commented 6 years ago

geotiff() currently writes out the geotiff in the dtype of the image array, so you have to convert it first:

In [3]: image = CatalogImage(catalog_id, band_type="MS", bbox=bbox)

In [4]: image = image.astype('uint16')

In [5]: image.geotiff(path="test16.tif", bands=[3,2,1], proj="EPSG:4326")
Out[5]: 'test16.tif'

This should result in:

$gdalinfo test16.tif

Band 1 Block=1008x1 Type=UInt16, ColorInterp=Gray
Band 2 Block=1008x1 Type=UInt16, ColorInterp=Undefined
Band 3 Block=1008x1 Type=UInt16, ColorInterp=Undefined
renaudallioux commented 6 years ago

thanks a lot!

joshwapiano commented 6 years ago

I've stumbled across this by chance after spending the day trying to figure out what I was getting wrong/coming up with a sub-optimal workaround.

I attempted the suggested code from @drwelby , as shown:

image = CatalogImage('104001000634B200', pansharpen=True, bbox=[113.789787,22.165092,113.809945,22.183253])
image = image.astype('uint8')
image.geotiff(path="fix_int8.tif", bands=[4,2,1])

And this produces an output in uint8 format, but if you view the image (I've tried in a standard image viewer and also QGIS Desktop) the output is rather strange: image

I need an image in .tif format for the research I'm doing, but the only reliable workaround I've found is as follows:

from PIL import Image
img = CatalogImage('104001000634B200', pansharpen=True, bbox=bbox)
img_np = img.rgb()
# Next to export as tiff
img_im = Image.fromarray(img_np)
img_im.save("fix2_int8.tif") 

Which loses all of the .tif tags, but returns an image matching what was observed in image.plot(). Here's the .tif created by the second method.

image

Hoping I've made a stupid mistake here?

drwelby commented 6 years ago

@joshwapiano

We just added a new method to make getting RGB geotiffs easier:

image = CatalogImage('104001000634B200', pansharpen=True, bbox=[113.789787,22.165092,113.809945,22.183253])
image.geotiff(path="fix_int8.tif", spec='rgb')

Adding spec=rgb will give you an 8-bit RGB geotiff that's been contrast stretched to the 2nd and 98th percentile values of the parent strip.

As for the first image, when you change the type down in precision it overflows so you get crazy rainbow colors. You would have to scale the values to the range of 0-255 first.

drwelby commented 6 years ago

For the second image, if you find yourself stuck in array world after running functions from other libraries, you should be able construct a world file from the transformation coefficients found in image.affine.

joshwapiano commented 6 years ago

@drwelby I was excited when I discovered the spec='rgb' method and had also tried that earlier too. Unfortunately, there is also an issue here that is slightly different to the other issue - the image appears to have overcompensated in the red band (below was retrieved using your code) -

image

Do you know why this red-shift would be occurring?

Thanks for the tip on image.affine - would you suggest I read the documentation uploaded today here to use that attribute?

drwelby commented 6 years ago

Yes, it's because plot() does the contrast stretching based on the pixels only within the plotted image.

However, geotiff() works slightly differently. It writes the raw image tiles directly to the geotiff file without loading them all into memory. To compute the contrast stretch without reading all the tiles we instead request metadata from the RDA service which gives us some statistics for the whole image strip. These values won't be the same, especially for an island in the ocean!

There is another way to take an array and add the geospatial information back to it so you can export a geotiff that takes some knowledge of the image class internals. I'll find you an example tomorrow.

The latest docs don't have any information about image.affine. The affine coefficients are (x-scale, y-skew, x-center, y-scale, x-skew, y-center), and a world file is:

x-scale
y-skew
x-skew
y-scale
x-center
y-center

So you would have to paste the values into a text file, rearrange a few of the terms, and save it as a .tfw file.

This is untested so I might have something reversed. I can see other people running into a similar situation so we're looking at adding something to make this easier.

Your other alternative is to use a GIS tool (gdalcopyproj if you have the gdal utilities) to extract or copy the georeferencing from the rainbow image to the pretty image.

joshwapiano commented 6 years ago

Thanks @drwelby , a few very good suggestions and I'll look in to copying the georeferencing from the rainbow image to the pretty image.

For information, my purpose for attempting to create high quality RGB imagery using GBDXtools is to feed them in to the baseline model provided by the DIUx-xView challenge (which recently finished) as part of a wider satellite data pipeline. My workaround approach with the pretty image is producing the best results at present when fed in to the baseline model. These images were sourced from Digital Globe, so perhaps you can share the pre-processing that was applied to their training imagery? If I can mirror this myself with new catalog imagery requests this could improve the effectiveness of the tool I'm building.

drwelby commented 6 years ago

Here's how to get the output array back to a geo-enabled image object:

# starting with
image = CatalogImage(...)
rgb = image.rgb()

# we'll also need to import
from gbdxtools.images.meta import GeoDaskImage
import dask.array as da
import numpy as np

# gbdxtools expects the bands in the first axis, like (3, 1000, 1200)
# matplotlib wants bands in the last axis: (1000, 1200, 3)
# rgb() outputs an array formatted for matplotlib, so we have to roll the band axis back
rolled = np.rollaxis(rgb,2,0)

# we also need to convert the array to a Dask array. A chunk size of 256 will work fine.
rgb_dask = da.from_array(rolled, 256)

# bootstrap a GeoDaskImage from the rgb Dask array and the original image's geo parameters
geo_rgb = GeoDaskImage(rgb_dask, __geo_interface__=image.__geo_interface__, __geo_transform__=image.__geo_transform__)

# export
geo_rgb.geotiff(path='rgb.tif')

We plan to add an easy wrapper method to do this in a future release.

drwelby commented 6 years ago

@joshwapiano I have asked about the xView imagery - if I find out anything I'll update here. There is another technique to generate uint8 RGB imagery that we use in other systems that might work for you that we can try if needed.

renaudallioux commented 6 years ago

@joshwapiano Add the same issue you can also use gdal to rewrite your geotiff: something like that.

from osgeo import gdal, osr, ogr

dataset = driver.Create(dst_filename, y_pixels,
                            x_pixels, 3, gdal.GDT_Byte, ['COMPRESS=LZW'])

resolx = image.metadata['georef']['scaleX']
resoly = image.metadata['georef']['scaleY']
xo = image.bounds[0]
yo = image.bounds[3]
srs = osr.SpatialReference()
srs.ImportFromEPSG(int(image.proj.split(":")[1]))
dataset.SetProjection(srs.ExportToWkt())
dataset.SetGeoTransform((xo, resolx, 0, yo, 0, resoly))
im = image.compute() * 255 / 2048
im = im.astype('uint8')

if band == "PAN":
    dataset.GetRasterBand(1).WriteArray(im[0, :, :])
else:
    if sat == "ge1":
        dataset.GetRasterBand(1).WriteArray(im[3, :, :])
        dataset.GetRasterBand(2).WriteArray(im[2, :, :])
        dataset.GetRasterBand(3).WriteArray(im[1, :, :])

    elif sat == "wv3" or sat == "wv2":
        dataset.GetRasterBand(1).WriteArray(im[3, :, :])
        dataset.GetRasterBand(2).WriteArray(im[2, :, :])
        dataset.GetRasterBand(3).WriteArray(im[1, :, :])

    elif sat == "wv4":
         dataset.GetRasterBand(1).WriteArray(im[1, :, :])
         dataset.GetRasterBand(2).WriteArray(im[2, :, :])
         dataset.GetRasterBand(3).WriteArray(im[3, :, :])

dataset.FlushCache()  # Write to disk.

but this take some time on large images.

Basically the fastest way I found is to write a float32 temporary image and use gdal_translate in bash (subprocess.call('gdal_transalte -ot Byte -scale temp.tif out.tif',shell=True)). This is dirty but pretty fast ;-)

If it is just for printing just use im = image.compute() * 255 / 2048 im = im.astype('uint8')

Beware, swir band of wv3 is 16bit while rgb is 11bit

drwelby commented 6 years ago

@renaudallioux Thanks for these examples.

There's another way to do scaling that keeps you in a geospatial image object by using the map_blocks function.

def rescale(n):
    return (n * 255 / 2048).as_type('uint8')

image = CatalogImage(...)
# slice to the RGB bands
image = image[[3,2,1], :, :] 
# add the rescale function to the Dask graph
im_scaled = image.map_blocks(rescale)
# export, which will run the rescale function on each chunk as the geotiff is built
im_scaled.geotiff()

Now the fun part:

image.display_stats includes scales and offsets that can be used to convert the raw pixel range in each band to a reasonable uint8 range. So you can use these scales and offsets in a mappable function. The only thing to note is that display_stats is computed by the RDA service on-demand and it's slow, so I recommend caching the values if you are going to create multiple subsets from the same catalog ID. This is similar to how we export the spec='rgb' geotiffs:

https://github.com/DigitalGlobe/gbdxtools/blob/40e1744de30e36a7e2e5bc8ecc136da2b64f9fdc/gbdxtools/rda/io.py#L49

renaudallioux commented 6 years ago

Very nice! I'll test this too

joshwapiano commented 6 years ago

Hi @drwelby , just following up here to check whether you heard anything back on the xView imagery?

Many thanks Josh

drwelby commented 6 years ago

@joshwapiano Sorry, I haven't heard anything but I asked on a Friday so...

Let me try again. Could you email me at marc.pfister @ digitalglobe?