Geographic Resources Data Migration Logic Update - Incorrect Logic (DDS-1071)

annikaLiving commented 3 years ago

Logic for the data transformation for Geographic Resources need to include more logic around what should include the following values and not as currently Geographic Resources not from the BCGW have values populated that only relate to the BCGW.

This may be related to logic for 357 not populating Preview Info values.

Logic - this won't quite get them all but can build an exceptions list.

IF not already populated then:

Object Name like 'WHSE%' or Object Name like 'REG%' and
Resource Storage Location = 'BC Geographic Warehouse'

Fields Populated:

Resource Storage Format = Oracle/SDE
Spatial Datatype: = SDO_GEOMETRY
Projection Name = ESPG_3005 - NAD83 BC Albers

examples of where these should not populated:

This is in ArcGIS Online ...dataset/quesnel-natural-resource-district-2019-planting-blocks/resource/064b701e-a58f-48b0-a792-061fa7858d81
This is from an Operational Database ... dataset/e1857418-3f63-43eb-911e-5a2f298c42f7/resource/8282e516-16cd-4485-a1f5-09ad385c0494
These are a GeoPackage and FGDB on an FTP site ...dataset/30aeb5c1-4285-46c8-b60b-15b1a6f4258b

For BCGW WMS and KML Logic on how these are populated needs a bit more work.

As these yes come from the BCGW and thus could have all that info but tecnically these are not in Albers.
More thought on what fields are to be kept for WebServices as right now showing way more fields than what was in Cat Classic. Partially related to 312.

annikaLiving commented 3 years ago

@TerryLanktree what would you like to see for the WMS and KML values?

TerryLanktree commented 3 years ago

Leave the empty fields empty. Like for like, then we get clean-up by Custodian, if required. If the field is now mandatory, could we populate with Not Provided, or Unknown? Does it need to be mandatory if there is no real info? Are we able to make mandatory for certain resource types and not others? @yuisotozaki @BrandonSharratt What are our options?

yuisotozaki commented 3 years ago

The field in question must have a purpose for the consumers even if it's to prime their mind into expecting certain information to follow. I.e. If we show the word "Albers" on the screen, the user will expect to see resource in that format. It would be great if we can show "Not provided" on the UI and when the custodian goes to edit, they must provide a value from the list (i.e. "Not Provided" is not a valid option at that point).

annikaLiving commented 3 years ago

There are many fields that are not mandatory, as they do not pertain to the dataset the provider is giving. These are examples of where those and where these fields should not be visible. Other examples is Temporal Extent. Not all data includes a time series, thus this is misleading the consumer that it is even visible.

The wording "No Provided" would be false in such cases, as this would infer that the provider just left this out, not that this is Not Applicable to this data.

There are many fields for resource types that do not belong as i wrote up in my resource type mock up pages 6 months ago. This included fields now added to Applications and Webservices that did not exist in Cat Classic and now fields are being populated with incorrect information.

This data here is showing that values were populated that did not exist in Cat Classic thus the values chosen are incorrect.

E.g., WMS and ArcGIS online are not in Albers.
The Operational Database may not be in SDO Geometry, could be in STE Geometry.

TerryLanktree commented 3 years ago

Being new to the Catalogue team, I have to ask some perhaps obvious questions: What is the intent of the filed? In the case of projection, if it's storage, we do store in Albers, even the items we spool out to WMS and ArcGIS online. It may be reprojected by the consuming application, and that knowledge may be good to have. Alternatively, it may be the projection the data was captured in (which would be very relevant, but I do not believe this is what is being captured). I always want to ask the questions: Why are we showing these fields? and, How useful is it to the consumer? If the database could be SDO or STE Geometry, does this have any implication to the consumer of the data? Can it affect analysis? Will it affect anything with the data download (which will not be in the storage format)? As for Temporal Extent, how many datasets do we have that need this? We are not a temporal warehouse, as far as I know, so is this simply because it's part of the standard?

annikaLiving commented 3 years ago

For the examples i provided, the values in Cat Classic do not include the values that i have listed (other than maybe projection if the user selected 3005). Therefore the data migration has something incorrect in the mapping and now showing incorrect data.

The Geometry Type was added as a requirement of IITD when they were going to use the catalogue. that project went away but it is probably coming back. we are to engage with them on this.

The Resource Storage Format and Spatial Datatype

have no impact on download,
have zero relevance to the general consumers. It is more for database managers like me and those at IITD and even then i would log into the DB to find out the answer.

Projection is valid for Geographic datasets and useful. would have to see if any dataset was set to another projection.

annikaLiving commented 3 years ago

Cleanup to be done after MVP

bcgov / ckan-ui

Geographic Resources Data Migration Logic Update - Incorrect Logic (DDS-1071) #367