VIDA-NYU / tile2net

Automated mapping of pedestrian networks from aerial imagery tiles
BSD 3-Clause "New" or "Revised" License
146 stars 22 forks source link

Upgrade `raster/source.py` #55

Closed dhodcz2 closed 3 months ago

dhodcz2 commented 4 months ago

Another user added another Source. However when they ran the check, they created a Raster for the entire state of Virginia at zoom=20! I tried this and ran out of memory. Now, a basic check for Source doesn't involve Raster:

    assert Source['New Brunswick, New Jersey'] == NewJersey
    assert Source['New York City'] == NewYorkCity
    assert Source['New York'] in (NewYorkCity, NewYork)
    assert Source['Massachusetts'] == Massachusetts
    assert Source['King County, Washington'] == KingCountyWashington

Now any pull request made will also run a test that the Source is actually returned when geocoded:

  1. Cache Source metadata locally We cache the metadata for all the Sources to disk so we needn't query the servers every time we test our code.

  2. Improve Source reliability First we georeference whatever the user passed. If there's no matches, we georeference a new address and then use the polygon from that. If still no matches, no Source found. If there's discrepancies, we exclude the Sources that don't have a matching keyword. If there's still discrepancies, we choose the Source whose geometry has the greatest IOU with the georeferenced bounding box.

  3. Fix Virginia coverage When I was testing the new Source added by a user, it was returning correctly when queried by name but not when queried by bounds. The metadata for the Virginia tiles seems to be incorrect. We retrieve the coverage of the dataset with the bounds specified in "fullExtent". If you parse this, you will have the following metadata:

    bounds = {
    "xmin": -2.0037507842788246E7,
    "ymin": -3.024097145838615E7,
    "xmax": 2.0037507842788246E7,
    "ymax": 3.024097145838615E7
    }

    Once you convert this to 4326 it is a full view of the globe. I have hard-coded the coverage for now.

  4. Implement more in-depth automatic testing for Source classes Another user added another Source. However when they ran the check, they created a Raster for the entire state of Virginia at zoom=20! I tried this and ran out of memory. Now, a basic check for Source doesn't involve Raster:

    assert Source['New Brunswick, New Jersey'] == NewJersey
    assert Source['New York City'] == NewYorkCity
    assert Source['New York'] in (NewYorkCity, NewYork)
    assert Source['Massachusetts'] == Massachusetts
    assert Source['King County, Washington'] == KingCountyWashington

    This also highlights the fact that our testing for the Source class is deficient. Now I have automatic testing for any Source class defined:

    for key in dir(tile2net.raster.source):
        cls = getattr(tile2net.raster.source, key)
        if (
                not isinstance(cls, type)
                or not issubclass(cls, Source)
                or abc.ABC in cls.__bases__
        ):
            continue
        # assert querying by the polygon returns the same source
        assert Source[cls.coverage.unary_union] == cls
        # assert querying by the keyword returns the same source
        if isinstance(cls.keyword, str):
            assert Source[cls.keyword] == cls
        else:
            assert any(
                Source[keyword] == cls
                for keyword in cls.keyword
            )
        # assert querying by the name returns the same source
        assert Source[cls.name] == cls

    We also have bespoke tests to assure we're getting what we expect when we pass other things:

    assert Source['40.72663613847755, -73.99494276578649'] == source.NewYorkCity
    assert Source['38.90277706745021, -77.03617656372798'] == source.WashingtonDC
    assert Source['43.05052202494481, -76.19505424681927'] == source.NewYork
    item = '33.97576931943177, -118.19841961122856, 34.116579445776445, -117.97154942950205'
    assert Source[item] == source.LosAngeles
    item = '40.496044, -74.443672, 40.561051, -74.332089'
    assert Source[item] == source.NewJersey
    
    assert Source[40.72663613847755, -73.99494276578649] == source.NewYorkCity
    assert Source[38.90277706745021, -77.03617656372798] == source.WashingtonDC
    assert Source[43.05052202494481, -76.19505424681927] == source.NewYork
    item = 33.97576931943177, -118.19841961122856, 34.116579445776445, -117.97154942950205
    assert Source[item] == source.LosAngeles
    item = 40.496044, -74.443672, 40.561051, -74.332089
    assert Source[item] == source.NewJersey
    
    assert Source['nyc'] == source.NewYorkCity
    assert Source['ny'] == source.NewYork
    assert Source['nj'] == source.NewJersey
    assert Source['new jersey'] == source.NewJersey
    assert Source['la'] == source.LosAngeles
    # just Spring Hill returns Spring Hill, Virgnia
    assert Source['Spring Hill, Tennessee'] == source.SpringHillTN
    assert Source['va'] == source.Virginia

    All of the tests passed.

dhodcz2 commented 4 months ago

We encountered some problems after updating the dependencies, so as of this pull request we will be using specific versions from pip freeze until a future version of tile2net allows us to maintain it with less effort. Specifically, the newer version argh removes argh.expects_obj, and there is an issue with the tensorboard in use from runx as well as well.