DOI-USGS / gems-tools-pro

GeMS Tools for ArcGIS Pro
Creative Commons Zero v1.0 Universal
46 stars 15 forks source link

Build Metadata tool issue (2.12.9): Bounding coordinates written to xml are way off... #92

Closed alwunder closed 10 months ago

alwunder commented 10 months ago

I am reproducing metadata for re-submission of GeMS packages using the new Build Metadata tool. With the old metadata tool, I would export a shapefile of the DataSourcePolys class and import the spatial information from it to the xml using Metadata Wizard. The bounding coordinates were always really close and required only a small adjustment to get them set to the exact quadrangle bounding lat/lon. See screenshot:

image

When I run the Pro Build Metadata tool on the same geodatabase, I get coordinates that are way off. I have looked at the extents of the individual feature classes within the database and they are all within reason, and the zoom to full extent tool in ArcGIS Pro zooms to the layers normally.

See the screenshot for an example of the extent I am getting for the same geodatabase as above: image

I am still trying to track down the issue, but I thought I'd see if anyone else was having the same problem. Thoughts?

ethoms-usgs commented 10 months ago

That's interesting. The bounding box built by the tool uses Metadata Wizard code, although MW only shows the extent of one feature class at a time, whereas the tool finds the maximum extent of all features in the gdb.

From MetadataWizard\fort-pymdwizard\pymdwizard\gui\spatial_tab.py:

def populate_from_fname(self, fname):
    msg = ""
    try:
        spdom = spatial_utils.get_bounding(fname)
        self.spdom.from_xml(spdom)
    except:
        msg = "Problem encountered extracting bounding coordinates"
        self.spdom.clear_widget()

where fname is the open file. I simply call spatial_utils.get_bounding() for each feature class and then find the max and min

def max_bounding(db_path):
    north = []
    south = []
    west = []
    east = []
    for layer in ogr.Open(db_path):
        if layer.GetGeomType() != 100:
            # if "MapUnitPolys" in layer.GetName() or "ContactsAndFaults" in layer.GetName():
            bounding = su.get_bounding(str(db_path), layer.GetName())
            north.append(bounding.find("northbc").text)
            south.append(bounding.find("southbc").text)
            west.append(bounding.find("westbc").text)
            east.append(bounding.find("eastbc").text)

    bounding.find("northbc").text = max(north)
    bounding.find("southbc").text = min(south)
    bounding.find("westbc").text = max(west)
    bounding.find("eastbc").text = min(east)

    return bounding

What is going wrong there?

alwunder commented 10 months ago

Interesting. I can use the arcpy.describe methods to return the same numbers that I see in the ArcGIS Pro layer properties "Extent" and they all seem valid. I also used my nifty little Describe database script to check the extents of all layers and the feature datasets and they are all correct, displaying the projected coordinates.

There doesn't seem to be a difference between "bounding" and "extent"? I see that within the spatial_utils.py there is a function get_extent which is called by the function get_geographic_extent, which is called by get_bounding... so it seems that this should be working correctly?

So, I tried using the code you supplied, first defining db_path and then modifying it to print in the Python window in ArcGIS Pro. Here are the results:

from osgeo import ogr
import spatial_utils as su
north = []
south = []
west = []
east = []
for layer in ogr.Open(db_path):
    # if layer.GetGeomType() != 100:
    if "MapUnitPolys" in layer.GetName() or "ContactsAndFaults" in layer.GetName():
        bounding = su.get_bounding(str(db_path), layer.GetName())
        north.append(bounding.find("northbc").text)
        south.append(bounding.find("southbc").text)
        west.append(bounding.find("westbc").text)
        east.append(bounding.find("eastbc").text)

bounding.find("northbc").text = max(north)
bounding.find("southbc").text = min(south)
bounding.find("westbc").text = max(west)
bounding.find("eastbc").text = min(east)
print('north: '+str(north))
print('south: '+str(south))
print('west: '+str(west))
print('east: '+str(east))

Results:

north: ['39.8789', '39.8798', '36.0031']
south: ['39.4434', '39.4277', '35.8727']
west: ['-81.7220', '-81.7491', '-89.3790']
east: ['-81.2495', '-81.2495', '-89.2472']

Where is it reading the three sets of coordinates from if I am only passing two layer names?

The strange thing is, I ran the Build Metadata tool on another database with similar erroneous results in the output xml: image

but when I ran the code in the Python window, I got good results:

db_path = "C:\GIS\PROJECTS\FY2022 STATEMAP Project\BelvidereTN_GeologicMap\BelvidereTN_GeologicMap.gdb"
from osgeo import ogr
import spatial_utils as su
north = []
south = []
west = []
east = []
for layer in ogr.Open(db_path):
    # if layer.GetGeomType() != 100:
    if "MapUnitPolys" in layer.GetName() or "ContactsAndFaults" in layer.GetName():
        bounding = su.get_bounding(str(db_path), layer.GetName())
        north.append(bounding.find("northbc").text)
        south.append(bounding.find("southbc").text)
        west.append(bounding.find("westbc").text)
        east.append(bounding.find("eastbc").text)

bounding.find("northbc").text = max(north)
bounding.find("southbc").text = min(south)
bounding.find("westbc").text = max(west)
bounding.find("eastbc").text = min(east)
print('north: '+str(north))
print('south: '+str(south))
print('west: '+str(west))
print('east: '+str(east))
north: ['35.2503', '35.2522', '35.2503']
south: ['35.1249', '35.1249', '35.1254']
west: ['-86.2504', '-86.2504', '-86.2506']
east: ['-86.1248', '-86.1193', '-86.1256']

A real mystery!!

alwunder commented 10 months ago

Evan, forgive me, but I really went down the rabbit hole on this one...

I decided to go over the Describe Database report again to see if there was something I missed on the Bonicord map (the first example above). I put the report in Excel, reformatted it a bit so it would play nice with the filtering, and looked again at the extent numbers. They all jive perfectly with the GeologicMap feature dataset describing the maximum extent (in projected units of feet) of the features on the map!

image

Then I looked at the other items in the report and this caught my attention: image

The feature datasets (CartographicElements and GeologicMap) have projections defined and they are the same. But look at the classes contained within them. Some show a different projection definition! They are actually the same projection but in units of meters instead of feet. So I passed those North and East coordinates (which are in feet) to the NGS coordinate converter as METERS: image Guess what happens? 39.8628N 81.2495W...

See further testing shown in my later comment. It helps explain these coordinate numbers.

[I removed this section that was "red herring" garbage...]

ethoms-usgs commented 10 months ago

Ok, my head is spinning a bit, but the first thing that jumps out is that I thought feature classes HAD to have the same SR as the feature dataset they were in. Not fair! I wonder how that happens.

Still, that point is moot concerning that spatial_utils.get_bounding() only considers OGR layers. I do need to add some code to exclude feature classes in datasets like CartographicElements and any others that have 'Unknown' spatial references, but for now we can ignore the spatial reference of the feature dataset itself.

I investigated a bit with a Jupyter Notebook. I took a TN database I have (Bellwood) and looked at the extent of ContactsAndFaults. Here's what drawing a box defined by the extent and the actual features looked like this:

image

The extent box (gray) is obviously off. It turns out the feature class extent is a property that is not always updated when features are edited. You need to run the Recalculate Feature Class Extent tool. After doing that, the extent box looks perfect:

image

I don't know if this is causing your crazy extents or not. Exporting a feature class to a shapefile must result in a new extent, so that could explain how MW shows a good extent in that case. Have to check out a few more examples.

alwunder commented 10 months ago

I guess what I still don't get is that the arcpy.da.describe extents are correct in projected units and when I pass those to the NGS coordinate converter I get the correct corresponding lat/lon. What is the OGR layer extent returning? The max the extent of the feature class ever was? (This seems to be true, see below.) If that's the case, I can tell you why it's happening in the Bellwood example you have above: I ran topology on a six-quad band that Bellwood was the westernmost of. That rectangle you got the first time is the extent of the entire set of six quads. Each quad then had it's ContactsAndFaults and MapUnitPolys sent to their respective databases, but the "extent" of the whole thing must have gotten transferred somehow.

I looked at the OGR layer .GetExtent documentation and it doesn't really shed any light on the logic of what it is reading, however, the fact that you can "force" the tool to recalculate the extent "expensively" means to me that it is understood that the extent returned may not match the actual extent of the features currently in the layer. I found this discussion and it appears very similar to this situation: https://trac.osgeo.org/gdal/ticket/6858. I also found where they made the "fix" to add the ability to force the update of an extent: https://trac.osgeo.org/gdal/ticket/4027.

I seem to recall back in the ArcGIS 9.x days that I had a little VBA macro button that I used to update the extents of feature classes and shapefiles that didn't behave when "zooming to layer" in ArcMap. For some reason, I thought this was not a problem in file geodatabases, but I guess it still is.

I will try two things. First I will use my "copy to new database" script to append all the classes in the offending gdb into a fresh copy of the GeMS schema that I know is defined correctly. The append should force the extent to be recalculated? We'll see. Second, I will make a little "update extent" script and see if that has any effect on the offending Bonicord database I was using as an example above. That one may still be problematic if that weird projection issue causes the MW extent tool to pass the coordinates to be converted in the wrong units.

Thanks for taking a look at this, Evan. I'll report back when I have done more testing.

alwunder commented 10 months ago

My research on this weird extent problem yielded two results: The first is that even after updating the extents of the feature classes in the original database with the strange projection definitions, Build Metadata provided incorrect results for the extent (as predicted) because the MW script is passing the projected coordinates in the wrong units for the north and east extent. These results match the results I got in the previous comment where I passed the projected coordinates in METERS to the coordinate converter: image

The second test was to append the datasets from the messed up db to a fresh, empty copy of a TNGeMS db with known good projection definition. It turns out that running the append tool DOES NOT update/recalculate the extent of the feature class, which is strange to me, but proven by the results for the extent found by the MW tool: image

So, I put together a little script to update the extents of all the feature classes in a database (see Discussion #93) and ran it on the known good projected db and voila: image

These are the correct/desired results! As expected, the numbers are just a tiny bit off due to the coordinate conversion, but the bounding box is correct.

Moral of the story: I will now be running the "Recalculate FC Extents" tool as part of my workflow to clean up my GeMS dbs before running the Build Metadata tool.

ethoms-usgs commented 10 months ago

In your last gdb, with "known good projection definitions", are the units in feet or meters? I am finding that when I calculate a bounding box using the spatial_utils code, that it deals with feet as expected.

I am pretty sure that for a given feature class or dataset, that both arcpy and ogr are pulling the extent information that is stored in the 'Definition' field of the system table 'GDB_Items'. As we have learned, those values are only updated occasionally or explicitly.

I think in the case of the gdb with mis-matched SRs, that all of the calculations are being done correctly on the feature classes themselves but that GeologicMap, although defined with feet, has the extent coordinates in meters because that's what the children feature classes use.

Note that when you ran max_bounding() with

if "MapUnitPolys" in layer.GetName() or "ContactsAndFaults" in layer.GetName():

as opposed to looking at all feature classes, you still got three sets of coordinates because of 'GMMapUnitPolysAnno'. Could that have been the cause of the some of the large extent rectangles? That the extent of that fc wasn't recalculated until you ran your script over the entire gdb?

Are we good at this point as long we know that the SR of the feature dataset matches the children and extents have been recalculated?

alwunder commented 10 months ago

The known good projected gdb shown last is in METERS, and its bounding box translated to lat/lon just fine.

The original feature classes were simply appended in to a new gdb, and even though it doesn't update the extent, it does convert the units of that extent. The append tool doesn't care what the spatial ref of the child items is--it looks at the feature dataset for that info. So, although the child items in the bad gdb had the unit of meters specified, the numbers were actually correct in feet. The append tool takes each item (class) and projects it on the fly to match the destination when the spatial references do not match. It chooses the geographic transformation based on the datums and location of the input features, and converts the projected units if necessary (in this case from feet to meters; also, you can force a particular transformation in the geoprocessing environment variables). The append tool was able to transform the coordinates correctly because it ignored the incorrect projection definitions of the child items. Then, when I finally updated the extents of the classes in the gdb targeted by the append (which had the correct spatial reference for ALL classes), the MW script worked just fine.

I see now the "MapUnitPolys" in layer.GetName()... Duh! Of course that's why I got three sets of coordinates, because of 'GMMapUnitPolysAnno'. I just missed that part of the logic.

Anyway, it's no problem at all now that I'm updating the fc extents before running the Build Metadata tool.

Thanks for taking the deep dive with me on this one. There's still the mystery of the bad gdb, but I'm going to pretend it didn't happen... The good news is the Build Metadata tool works exactly as expected, and it showed me that I needed to be doing more to clean up my gdbs before finalizing them.

ethoms-usgs commented 10 months ago

Ok, great. I will probably add recalculate extent to the metadata tool as well.