azgs-geomapmaker / geomapmaker

A new toolbar for making geologic maps in the ESRI ArcPro environment.
MIT License
5 stars 2 forks source link

Investigate use of postgres for Gems GDB hosting #1

Closed NoisyFlowers closed 3 years ago

NoisyFlowers commented 4 years ago

Postgres is not the standard choice for hosting ArcSDE Enterprise Geodatases. There are known problems with capitalization. There may be others. We need to do an initial evaluation for potential showstoppers.

Here's the methodology. Using the tools at https://github.com/usgs/gems-tools-pro:

1) Install the Gems tools per instructions. These manifest as tools in the Catalog>Favorites tab of Arc Pro. 2) Create a File GDB that is legal Gems (using the Create New Database tool) 3) Import this into an Enterprise Geodatabase hosted in postgres. 4) Add some features. Add a feature class. 5) Export this to a File Geodatabase 6) Validate this File Geodatabase for Gems using the Validate Database tool.

This should help us get an idea of the efficacy of postgres for out needs.

NoisyFlowers commented 4 years ago

Right off the bat, I can see that capitalization of the table names and field names is lowercase in the Enterprise GDB. Table name capitalization is restored to mixed case when exported to a File GDB. However, capitalization of field names is not corrected.

I see that the alias column of each field still contains the mixed case version of the field name. We can restore capitalization by copying these aliases to the field name column. I've developed a python tool for this, which I will add to this repository.

The next thing I noticed is that all columns that were Single in the File GDB are changed to Double in the Enterprise GDB. They remain Double when exported to the File GDB. This breaks validation.

I spent much of this week trying different ways of fixing this. I tried creating an XML Geodatabase intermediary and importing that into the Enterprise GDB. Same thing happened.

I developed an import script that would find all the problem fields first, do the import, then iterate the list of problem fields and alter their type in the Enterprise GDB. Everything appeared to work (no errors) except that the type remained Double.

Eventually, I realized that the resulting Enterprise type is dependent on not just the Data Type of the Field, but also its Precision and Scale settings. These are not accessible to the AlterField tool I was using. So, I bailed on that script (I'll probably upload it here for posterity anyhow), and decided to just manually edit the XML Geodatabase to contain the values we need in those settings for each field. I took a quick stab at doing this in that script using XPath, but it turns out that the default XML parsing library in Python is not fully XPath compliant. I didn't feel like figuring out how to pull an external library into arcpy, so instead decided to just edit the XML manually. This seems reasonable, since we only really need to do this once in order to have our Gems GDB template.

Using this edited XML template, the fields in question stay Float (Single) when imported to Enterprise. I'll need help deciding what values we really want in those Precision and Scale columns. For now, I'm using 6 and 1 respectively. Details on these values are at https://desktop.arcgis.com/en/arcmap/latest/manage-data/geodatabases/arcgis-field-data-types.htm.

After importing this XML template, then exporting using my script that fixes up capitalization, I was able to a obtain a File GDB that passes Level 2 Gems compliance and almost passes Level 3. The only thing preventing Level 3 compliance is an error that also exists in the original File GDB generated by the Gems tool itself, so I presume we can fix that up, though I have no idea what it means:

HierarchyKey errors, DescriptionOfMapUnits
    No HierarchyKey values?!

So, to sum up, I think we're still good with postgres for this. The process will be:

1) Create an XML Geodatabase template that is Gems compliant and has whatever else we need in it. 2) Anytime we need to setup a shared Enterprise GDB, import this template using the builtin Import XML Workspace Document in ArcGIS Pro 3) When it's time to export the Enterprise GDB to a File GDB for archival in azlibrary, use my ExportGems script, which exports and then patches up the case of the Field names.

aazaff commented 4 years ago

This is wonderful work.

  1. Laura already usually creates DBs using CML import anyway so this should work well for her and she knows how to edit them if necessary
  2. Yes the hierarchy key also looks like a nonproblem

Great news!

NoisyFlowers commented 4 years ago

I've been fiddling with importing/exporting and the Gems tools and I've discovered an irregularity. It is vitally important to remove all the SDE feature classes and tables from the map before running my export script. Otherwise, validation will fail on the exported gdb file. I have no idea why.

All I can say is that in the validation script there is a call to arcpy.ListFields() on each table to get a list of the field names. If the gdb file was exported from an SDE gdb that was not attached to the map, this call returns field names with mixed case, and the validation passes. If the gdb file was exported from an SDE gdb that was attached to the map, this call returns field names in lower case, causing the validation to fail.

The really weird thing about this is that when I open tables from this failing file gdb in Arc, the field names are mixed case. So I dunno wtf.

aazaff commented 4 years ago

That sounds like something that we could potentially ask them to change in the validation script.

NoisyFlowers commented 4 years ago

Possibly, but I worry it's an Esri thing. That arcpy.ListFields() is a Geoprocessing Tool from Esri. It would be nice to ask them how it can return a list of lower case names, when the same gdb accessed through Arc Pro desktop shows the field names in mixed case.

NoisyFlowers commented 4 years ago

Another question: is this an artifact local to the machine on which we run the export? What if someone has the SDE gdb open in a map on another machine when we run the export?

NoisyFlowers commented 4 years ago

Ok, this keeps getting more nutty. It looks like the problem is with arcpy.ListFields itself.

I ran the Compare Features tool to compare a file gdb exported while the SDE gdb was on the map to a file gdb exported while the SDE gdb was not on the map. According to this tool, they are identical.

This means that the problem is not happening during the export script.

Next, I ran the Gems validation script on the file gdb exported while the SDE gdb was not on the map. The file gdb passes, as expected. Then I added the SDE gdb to the map, and ran the validation tool on the same file gdb. It fails due to the missing fields!

Here are excerpts from the output of some instrumentation I added to the validation script. The first is from the run when the file gdb passes, the SDE gdb is not on the map:

scanning DataSources
DataSources is in GeMS_Definitions
checking fields in  DataSources
before ListFields, current workspace = C:\Users\Douglas\Documents\ArcGIS\tmp\2020-08-05\notOnMap.gdb
existing field objectid
existing field Source
existing field Notes
existing field URL
existing field DataSources_ID
existing field created_user
existing field created_date
existing field last_edited_user
existing field last_edited_date

Note that the workspace is the file gdb. The "existing field" lines are from iterating the return from arcpy.ListFields. Note that Source, Notes, URL, and DataSources_ID are mixed case.

Next is from the run when the file gdb fails, the SDE gdb is on the map:

scanning DataSources
DataSources is in GeMS_Definitions
checking fields in  DataSources
before ListFields, current workspace = C:\Users\Douglas\Documents\ArcGIS\tmp\2020-08-05\notOnMap.gdb
existing field objectid
existing field source
existing field notes
existing field url
existing field datasources_id
existing field created_user
existing field created_date
existing field last_edited_user
existing field last_edited_date

Note that the workspace is the same file gdb. But the existing fields are all lower case!

Clearly, arcpy.ListFields is looking at something other than the actual content of the file gdb in this case.