Closed thatbudakguy closed 1 month ago
dc_type_s | gbl-1_to_aardvark | gbl2aardvark.js | V1AardvarkMigrator |
---|---|---|---|
Collection | Collection | Collections | Collections |
Dataset | Dataset | Datasets | Datasets |
Image | Image | Imagery | Imagery |
Interactive Resource | Interactive Resource | Websites | Websites |
Physical Object | Physical Object | Other | Other |
Service | Service | EDIT ME -- this record had dc_type_s = Service | Web services |
Still Image | Still Image | EDIT ME -- this record had dc_type_s = Still Image | Imagery |
other value | other value | EDIT ME -- this record had dc_type_s = other value | Other |
layer_geom_type_s | gbl-1_to_aardvark | gbl2aardvark.js | V1AardvarkMigrator |
---|---|---|---|
Point | Point | Point data | Point data |
Line | Line | Line data | Line data |
Polygon | Polygon | Polygon data | Polygon data |
Image | Image | EDIT ME -- this record had layer_geom_type_s = Image | no value |
Raster | Raster | Raster data | Raster data |
Mixed | Mixed | EDIT ME -- this record had layer_geom_type_s = Mixed | no value |
Table | Table | Table data | Table data |
other value | other value | EDIT ME -- this record had layer_geom_type_s = other value | no value |
For Resource Class, the DCMI definition for "Still Image" mentions maps, so I don't think it will be possible to automatically convert "Still Image" to either "Imagery" or "Maps" (at least without trying to glean information from other fields like the title or description).
For Resource Type, I'm a hesitant to convert "Image" to "Raster data", since the image is most likely an aerial image or some form of map, and the recommended Resource Type terms includes specific terms like "Aerial photographs", "Bathymetric maps", "Cadastral maps", "Fire insurance maps", "Nautical charts", "Topographic maps", and many other types of maps. To me "Raster data" is numeric data like elevation, precipitation, or temperature -- not imagery that could be further processed to extract data.
These are both good points. re: resource type, I think you're right that something like a (scanned) sanborn map is an Image, but isn't (raster) Data (unless you do some more processing to it).
re: resource class, the definition for "Still Image" says:
Instances of the type Still Image must also be describable as instances of the broader type Image.
so, if we convert "Image" to Imagery, it seems like "Still Image" must necessarily be Imagery (but isn't necessarily Maps). does that make sense?
I'm not finding any records with "dc_type_s":"Still Image" in any of the OGM repos (using the github search within the OGM organization), so maybe we don't need to worry about that value. Not finding any "Physical Object" records either.
I agree that in practice it probably will rarely happen, but I was thinking this issue could be a place to decide the "official" strategy for any value that might occur — so any valid dc_type_s
as well as other random values (which definitely show up in our data).
The assumption underlying that is that other folks looking to migrate from v1 to Aardvark will be in a situation where they don't have time to manually correct or postprocess all the records after conversion, and just want a converted version that is "the least wrong". So, converting everything in one go and knowing exactly how the fields will be mapped (regardless of which converter/implementation you use) would be useful. But maybe other folks aren't in this situation or don't have the same constraints?
Yes, we should probably just pick a behavior and document it. It would be good if the process could also output a list of warnings (for things such as this), that folks could choose to follow up on (or not).
Updated the behavior table above to reflect the changes in https://github.com/OpenGeoMetadata/GeoCombine/pull/143/commits/f30e6d19a08720bbc24f8b7b70df6bb0742d4fd9.
@thatbudakguy does this need any more work or can we close the issue?
I think this can be considered implemented in recent versions of GeoCombine (and in other libraries mentioned above), so closing.
Opening this issue to discuss the possibility of a minimal or "standard" conversion between the controlled vocabularies for
dc_type_s
togbl_resourceClass_sm
(Resource Class)layer_geom_type_s
togbl_resourceType_sm
(Resource Type)...when migrating from v1 to Aardvark.
Known implementations for this behavior are:
setResourceClass
andsetResourceType
in @kgjenkins'sgbl2aardvark.js
(javascript)convert_non_crosswalked_fields
in my draft PR to GeoCombine'sV1AardvarkMigrator
(ruby)1.0-to-aardvark.py
in @karenmajewicz'sgbl-1_to_aardvark
(python)Some questions we could answer that I think would help unify the (currently diverging) implementations:
dc_type_s
orlayer_geom_type_s
? Lots of records "in the wild" seem to not completely obey these vocabularies.dc_type_s
is definitely the same as "Datasets" ingbl_resourceClass_sm
.layer_geom_type_s
be "Raster data" ingbl_resourceType_sm
?