DataONEorg / object-formats

DataONE Object Formats controlled vocabulary
Apache License 2.0
1 stars 3 forks source link

GeoJSON #16

Closed twhiteaker closed 3 years ago

twhiteaker commented 4 years ago

Format Metadata

Provide the standard metadata for the proposed format, ensuring that the id and name are unique and appropriate to the version of the format being proposed.

2008 version:

current version:

Another extension is .geojson.

Format description

Describe why a new format is needed, including items such as where the format type has been encountered, what software produces it, and what software can read it.

This is an open format for storing vector geospatial data. It can be read by various geographic information system (GIS) software including ArcGIS and QGIS. There is a 2008 version, and then a 2016 version (RFC 7946) that is more restrictive. RFC7946 supersedes the 2008 version? The 2008 version is still exported by some software (e.g., ArcGIS for Desktop) as far as I can tell.

This is one of the formats recommended by an EDI/LTER working group developing best practices for archiving spatial data.

Specification / Namespace documentation

Provide the location(s) of the documentation of the format specification or the namespace for the format or vocabulary.

2008 Specification: https://geojson.org/geojson-spec.html Current Specification: https://tools.ietf.org/html/rfc7946 Version differences discussed: https://github.com/Esri/arcgis-to-geojson-utils/issues/21 Media type from: https://op.europa.eu/en/web/eu-vocabularies/at-concept/-/resource/authority/file-type/GEOJSON/?target=Browse

Checklist

Considerations

I got the media type from the EU vocab. Note that Library of Congress has instead application/vnd.geo+json.

Did I use the plus sign and the dash correctly in the format ID?

I don't know which extension is more commonly used for GeoJSON files, .json or .geojson.

mbjones commented 3 years ago

Can anyone with more knowledge than me of GeoJSON evaluate this format proposal? It seems reasonable and has been here quite some time.

amoeba commented 3 years ago

That I didn't know there were to specs probably disqualifies me from an opinion but I'd put in a vote for having just a single GeoJSON format type. Parsing and serialization of GeoJSON does differ between the two specs but (1) it looks like the differences are minor and (2) RFC7946 has failed to become the default serialization in popular GIS tooling (ArcGIS, QGIS). I think a datateam member or scientist would have a hard time picking between the two.

@twhiteaker do folks in your community tend to make a distinction between the two formats?

twhiteaker commented 3 years ago

@twhiteaker do folks in your community tend to make a distinction between the two formats?

No, and before I created this issue and did the background research, I didn't know there were two specs either.

twhiteaker commented 3 years ago

In the appendix of the newer spec, they list changes between the old and new specs. It looks to me like files made under the newer spec would be valid under the old one. In my opinion, the difference that would most likely come up in an actual dataset is that the old spec allows other coordinate systems besides WGS 84, which would be indicated in the crs member of that dataset. For a machine parsing a GeoJSON file, either you can handle crs or you can't, so knowing which GeoJSON spec is used doesn't matter. Where I could see it mattering is if the actor was searching for GeoJSON files and only wanted one spec or the other, most likely excluding the older spec if the parser can't handle other coordinate systems. But, just because you have the crs member doesn't necessarily mean you're using something besides WGS 84, so I think you'd want to download and try to parse such files anyway.

That's my assessment of the impact of the difference in specs. From a practical standpoint, I suspect having a single GeoJSON format type is fine.

amoeba commented 3 years ago

Thanks for doing that legwork, @twhiteaker.

mbjones commented 3 years ago

Thanks, @twhiteaker . So, given your summary, if we were to register the formatId for RFC7946 as application/geo+json (which is simpler), and not register the 2008 version at all, few people or tools would likely care, right? That simplifies things for people, and wouldn't prevent us from registering the 2008 version at a later date if needed. So, I guess I am proposing we only register one for now:

Please weigh in if you or anyone thinks we should register both versions instead, or give your approval or other feedback here. Thanks!

twhiteaker commented 3 years ago

The part I worry about is the version RFC 7946 bit in the formatName. I don't think most scientists will know what that is, and they may incorrectly tag their products as this when their files actually follow the 2008 spec. (Maybe I should be asking how datasets get tagged. If a machine if figuring it out, maybe it will get it right?)

I wonder if it would be better to leave the version out, and put the burden on the end user who downloads a file to deal with whatever format it's in.

mbjones commented 3 years ago

Ah, good point. I meant to remove that. I removed it from the formatId, but missed it in the name. So, a revision:

twhiteaker commented 3 years ago

I'm ok with that.

mbjones commented 3 years ago

Added PR #33 to implement this change.

mbjones commented 3 years ago

Closed, awaiting release in PR #35.