frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
498 stars 113 forks source link

Spatial Data Package specification proposal #499

Closed henrykironde closed 7 years ago

henrykironde commented 7 years ago

The Spatial Data Package specification:

This proposal provides specifications for the Spatial Data Package. The proposed specifications are an extension of the Data package specification created by Frictionless Data. The current status of the Data package specification cover tabular data (Tabular Data Package). The Tabular Data Package provides a platform to standardize and organize data making sharing among tools and people effortless.

Relationship between a Tabular Data package and a spatial Data package

Unlike Spatial Data, Tabular data is simply text data separated by special delimiters(comma, tab and etc..) in a text file. Spatial data occurs in various forms of complex data structures often associated with the file extension.

Spatial data Categories

Spatial data is categorized into two groups, raster data and vector data. In the vector data model, geographical elements are represented using points, lines and polygons. Vector data captures and represents discrete objects with boundaries(Lakes, Rivers. roads and etc..).

The Raster data model is used to store data element using pixels or cells . The value of these cells captures the type of object or entity that is observed. A good example is a digital photograph, the pixels in the photo store a color that corresponds to the real world object at that point. Rasters can store discrete data, for example thematic information of land cover and continuous data for example chemical concentrations(Carbon Dioxide, Nitrates).

Vector Data Specifications

The specifications inherit the data package specifications like

Recommended Properties

Optional Properties

{
 #required
  "name": "name of the data",
  "title": "human readable label or title for the dataset",
  "gis_class": "Raster data or vector data",
  "file_type": "extension of format of the dataset",
  "description": "A good description for the dataset",
  "license": "A license",
  "keywords": ["rivers", "North America",], "keywords separated by comma" 
  "citation": "citation for the dataset",
  "spatial_ref": "Coordinate Reference System"
  "citation": "A good description for the dataset",
  "[path or url]":"path to the file"
  "resources": [
      #For each layer, give a name and the properties 
      #layer one
      { 
        "name": "Name for the layer eg.river",
        "Geometry_type": "point, linestring,....", "geometry_notation": 
        "NoDataValue": "what represents missing values",
        # define attribute data and type for each vector feature
        "schema": { 
          "fields": [
            {
              "name": "data name",
              "type": "data type"
            },
            {
              "name": "data name",
              "type": "data type"
            },
            {...}
          ],
        }
      },
      #layer two
      {....},
      #layer three
      {..}
}

Rasters

Like the vector data specifications, raster data specifications inherit the core components of the data package specifications. Rasters can have multiple nested datasets within a file, however the Json schema take on a similar structure like the vector data schema

The data package

Json schema example

{
    #required
    "name": "name of the data",
    "title": "human readable label or title for the dataset",
    "format": "extension of format of the dataset or  driver required",
    "file_size": "size of file on disk",
    "group_count": "Number of groups in the dataset if applicable"
    "dataset_count": "The number of individual datasets"
    "description": "A good description for the dataset",
    "license": "A license",
    "keywords": ["carbon map", "North America",], "keywords separated by comma" 
    "citation": "citation for the dataset",
    "version": "The version of the dataset"
    "homepage": "The home page of the data"
    "datum": "Coordinate Reference System",

  ""
  "[url or path]": "link to where the data is stored"
  #each band is defined
  "resources": [
    {
      "Group": "Name for the group if applicable",
      "name": "Name for the band",
      "relative_path": "Location relative to route path/url above",
      "resolution": "The resolution",
      "resolution_units": "The units of resolution",
      "dimensions": "dimensions",
      "NoDataValue": "pixels where data is missing or no data collected",
      "geoTransform": "The transformation of the dataset",
      "parameter": "The parameter or feature",
      "extent": ["the extent values of the band"],
    },
    { ...},
  ]
}
Stephen-Gates commented 7 years ago

Thanks for this Henry.

I think a worked example using real data would help to clearly separate what's needed in a :

Thanks for starting the conversation.

rufuspollock commented 7 years ago

@henrykironde this is great - and we have an existing issue here for "Geo Data Packages" #86 - I think these two are similar so we can connect them.

henrykironde commented 7 years ago

Closing because of replication.