Closed mbjones closed 3 years ago
@jeanetteclark @laijasmine @datadavev @srearl @twhiteaker @taojing2002 @csjx and others.... please comment on how this proposed format identifier for zipped shapefiles looks to you. There is an associaed PR #4 with the exact XML, which boils down to adding:
<objectFormat>
<formatId>application/x-shapefile-zipped</formatId>
<formatName>ESRI Shapefile (zipped)</formatName>
<formatType>DATA</formatType>
<mediaType name="application/zip"/>
<extension>zip</extension>
</objectFormat>
Format suggestion looks OK. Agree on the lack of mechanism for media type to indicate file types contained within a zip. https://tools.ietf.org/html/rfc6839#section-3.6 provides a suggestion, but not for multiple file types within a zip.
Refs:
I agree with the format name indicating shapefile while the mediaType is a generic zip.
ESRI goes by Esri these days, so would it make sense to use that casing?
Before this PR, the main reference I had for this type was the EU ref that @datadavev cited, in which they call it x-shapefile. The shapefile addition in the current PR uses x-shapefile-zipped, and is the only example in the XML of "-zipped" appearing in the formatId. This is fine with me, if "-zipped" will be used by convention for future additions when a format is composed of more than one file type collected in a zip archive. If that's the case, perhaps that convention should be documented somewhere in this repo so that future contributors are aware of it.
Should the format indicate that only one shapefile should be in the zip file?
I like the format. Agree also with @twhiteaker's suggestion that, if possible, the format should specify that the zip contains a single shapefile (or would that be a best practice?). This may be rare to the point of being a non-issue but thinking of possible scenarios, I guess a separate format would be needed for shapefiles coalesced into a different format (e.g., gz
)?
We could use application/x-shapefile
to match the format described at https://inspire.ec.europa.eu/media-types/application/x-shapefile, which seems to be congruent. I had added -zipped
to indicate that it is not the raw shapefile per se, but am happy to drop that part of the name if people like application/x-shapefile
better. Their statement that it is superseded by application/vnd.shp
is not correct, because that media type refers to only the .shp file, and not the others like .shx, and not in a zipped container.
I like the suggestion in the first link from @datadavev to use media type suffixes for compound types. If we did that, we could make the media type be application/x-shapefile+zip
, or it could even be application/vnd.shp+zip
which would be accurate (although it ignores the presence of other files in there like .shx files). IANA specifically recommends NOT to use the x-
experimental types anywhere, and so using application/x-shapefile
as the formatId and application/vnd.shp+zip
as the media type could be a good compromise.
And yes, let's change the capitalization of Esri.
Note that the inspire registry indicates x-shapefile
is superseded by vnd.shp
, so application/vnd.shp+zip
is perhaps appropriate to indicate a zip file contains components of a shape file.
I'd be comfortable with either approach, as long as the approach utilizes a pattern that we can reuse for similar cases. For example, once we have shapefile sorted out, I'll have a hankering for adding geodatabase (a zipped folder of GIS files comprising a file based database) to the list.
OK, sounds good. I updated the metadata in the issue description above to reflect the use of application/vnd.shp+zip
for both the formatId
field and the mediaType
field. I also updated PR #4 to reflect this change, and merged it into the develop
branch. So, last call for any comments or changes -- feel free to speak up if something doesn't seem quite right to you (we live with these decisions for quite some time....). Thanks.
Looks good to me.
For the record, the final decision on the format is:
<objectFormat>
<formatId>application/vnd.shp+zip</formatId>
<formatName>Esri Shapefile (zipped)</formatName>
<formatType>DATA</formatType>
<mediaType name="application/vnd.shp+zip"/>
<extension>zip</extension>
</objectFormat>
This will go into the next merge of the formats vocabulary.
Format Metadata
Provide the standard metadata for the proposed format, ensuring that the id and name are unique and appropriate to the version of the format being proposed.
application/vnd.shp+zip
Esri Shapefile (zipped)
DATA
application/vnd.shp+zip
.zip
Format description
Describe why a new format is needed, including items such as where the format type has been encountered, what software produces it, and what software can read it.
This is for a zipped shapefile directory following the specification for the ESRI Shapefile (http://en.wikipedia.org/wiki/Shapefile) format, which is a common format used for representing vector geospatial data and is defined in https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf. Shapefiles are unusual because the format specification requires the use of three mandatory files (
.shp
,.shx
, and.dbf
) as well as several other optional files, all of which share the same basename and must be in the same parent directory, and which collectively constitute the "shapefile" dataset. So, the individual file that has a.shp
extension is incomplete without the collection of other files in a directory that together make up a shapefile dataset. Typically, this directory is zipped up for exchange (so the zipped directory often has the.zip
extension). In DataONE, many of these zipped up shapefiles are present and typed as zip files, and so are unrecognizable as the more specialized shapefile variant.In this proposal, I suggest that we create a format for zipped shapefiles that allows this specialized variant of zip files to be recognized and registered as such. This identifier would only be used for objects that represent a zipped directory containing the files that constitute a dataset in ESRI Shapefile format, and would not be used for the individual file components of such a dataset (which each would have different types, and could be the subject of another proposal). The individual subcomponents of a Shapefile have the following assigned Media types:
application/vnd.shp
: https://www.iana.org/assignments/media-types/application/vnd.shpapplication/vnd.shx
: https://www.iana.org/assignments/media-types/application/vnd.shxapplication/vnd.dbf
: https://www.iana.org/assignments/media-types/application/vnd.dbfThe Media type of a zipped shapefile is unclear from the specification. My conclusion is that it is best to give it the media type
application/zip
, and rely on the more specificformatId
to differentiate these from other arbitrary zip files.This format was first requested in Redmine Issue 6883 in 2015, and has been needed for a while.
Specification / Namespace documentation
Provide the location(s) of the documentation of the format specification or the namespace for the format or vocabulary.
Checklist
image/png
is specific to one format, whereastext/xml
is not specific to one format)DATA
,METADATA
, orRESOURCE
Considerations