Open rgaiacs opened 2 years ago
@rgaiacs - am I correct to think that the Frictionless Data Package would only contain information about the tabular data files?
Some points we may need to consider prior to adding this button include:
am I correct to think that the Frictionless Data Package would only contain information about the tabular data files?
You are incorrect. From https://specs.frictionlessdata.io/:
A [Frictionless] Data Package is a simple container format used to describe and package a collection of data (a dataset).
A Data Package can contain any kind of data. At the same time, Data Packages can be specialized and enriched for specific types of data so there are, for example, Tabular Data Packages for tabular data, Geo Data Packages for geo data etc.
The Frictionless Data Package team has focus most of their work on Frictionless Tabular Data Packages that is limited to tabular data.
what to do with non-tabular data files
They are listed. You don't need to do anything. It is up for the user.
potential to include a tabular data file of the sample information (taken from the database)
This is possible.
Actually this is looking like a machine readable and well formatted version of our "readme" files. i.e. it has the dataset metadata (resource details) held in the "data package" header region of the json file, followed by each file with all their metadata in the "data resource" section-1 block per file (see below) the only thing that I dont see a natural place for are the links to the externally hosted associated data, e.g. BioProject accessions, manuscript links, proteomexchange, EGA, etc... but presumably we can add our own attributes to accommodate those things.
{
"name": "our DOI number",
"datapackage_version": "1.0-beta",
"title": "gigadb dataset title",
"description": "...",
"version": "1.0",
"keywords": ["name", "My new keyword"],
"licenses": [{
"url": "http://opendatacommons.org/licenses/pddl/",
"name": "Open Data Commons Public Domain",
"version": "1.0",
"id": "odc-pddl"
}],
// I'm not sure what this section holds in relation to GigaDB?
"sources": [{
"name": "World Bank and OECD",
"web": "http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"
}],
// This will hold the author list, it can include ORCIDs
"contributors":[{
"title": "Joe Bloggs",
"email": "joe@bloggs.com",
"web": "http://www.bloggs.com"
}],
"maintainers": [{
// like contributors
}],
// GigaScience Database can be put in here?
"publishers": [{
// like contributors
}],
// Could any links to other resources go in here? e.g. BioProjects, Manuacripts etc...
"dependencies": {
"data-package-name": ">=1.0"
},
// The each of the files gets listed individually here as resources
"resources": [
{
"name": "file_name.csv",
"path": "http://ftp.cngb.cn/pub/..../file_name.csv",
"title": "", // we dont use title in GigaDB
"description": "add the file description here",
"format": "csv", //we call this file format
"mediatype": "text/csv", // we call this data type
"encoding": "utf-8",
"bytes": 1, // files size
"hash": "", // we use md5sum values
"schema": "", // this section can be used to define columns in tabular data
"sources": "",
"licenses": "" // this will be cc0 unless there is a specific attribute assigned to a file
}
{
// repeat as required
}
],
// extend your datapackage.json with attributes that are not
// part of the data package spec
// we add a views attribute to display Recline Dataset Graph Views
// in our Data Package Viewer
"views" : [
{
... see below ...
}
],
// you can add your own attributes to a datapackage.json, too
"my-own-attribute": "data-packages-are-awesome",
}
the only thing that I dont see a natural place for are the links to the externally hosted associated data, e.g. BioProject accessions, manuscript links, proteomexchange, EGA, etc... but presumably we can add our own attributes to accommodate those things.
Yes, you can add extra attributes.
User story
Acceptance criteria
Additional Info
Product Backlog Item Ready Checklist
Product Backlog Item Done Checklist
This is part of Epic #1118