azavea / raster-vision-examples

Examples of using Raster Vision on open datasets
Other
173 stars 33 forks source link

Add example for using shapefiles #32

Open mlaradji opened 5 years ago

mlaradji commented 5 years ago

Problem Statement

The Esri shapefile format is a popular vector data format. As such, I believe it would be worthwhile to construct an example illuminating how shapefile labels can be used in raster-vision.

References

A good example, IMHO, would be importing the datasets used in WaterNet, which consists of TIFF base images and shapefile labels (sometimes multiple shapefiles correspond to the same base image). In WaterNet, the shapefiles are "burnt" into raster images consisting of 0's and 1's for Not Water and Water. This conversion, however, is not memory efficient.

Some Preliminary Work

Ideally, raster-vision should be able to read the shapefile directly, but I am unsure if this yet possible or feasible. A workaround would be to convert shapefiles into a format recognized by raster-vision, such as GeoJSON or vector tiles.

For conversion into GeoJSON, something like the following code snippet can be used (which was designed for the dataset used in WaterNet):

import os.path
from . import ogr2ogr

def create_merge_vrt(sources, vrt_uri):
    '''
    Writes a vrt file for use with ogr2ogr to merge files together.
    '''

    with open(vrt_uri, "w") as vrtfile:

        # Create wrapper.
        vrtfile.write("""<OGRVRTDataSource>
    <OGRVRTUnionLayer name="unionLayer">
""")

        # Add sources.
        for source in sources:

            print('Adding source: {}'.format(source))

            vrtfile.write("""            <OGRVRTLayer name="{}">
                <SrcDataSource relativeToVRT="0">{}</SrcDataSource>
            </OGRVRTLayer>
""".format(get_file_name(source), source))

        # Conclude file.
        vrtfile.write("""    </OGRVRTUnionLayer>
</OGRVRTDataSource>
""")

    print('Created vrt file at {}.'.format(vrt_uri))

def convert_to_geojson(sources, save_uri):
    '''
    Convert a list of files to the GeoJSON format.
    '''

    print('Converting sources to GeoJSON at {}{}...'.format(save_uri,'.geojson'))

    #Create a VRT file that includes information on the included sources.
    vrt_uri = save_uri + '.vrt'
    create_merge_vrt(sources, vrt_uri)

    # Convert shapefile to GeoJSON
    ogr2ogr.main(["","-f", "GeoJSON", save_uri+'.geojson', vrt_uri])
    print('Created GeoJSON file ({}).'.format(save_uri+'.geojson'))

Though this code is hacky, it is quite memory efficient (thanks to ogr2ogr).