Closed lambdamusic closed 6 years ago
@lambdamusic
can I create a record for a dataset hosted elsewhere? if yes how?
Yes, that is very easy: you just create an datapackage.json
file that describes your data (this is a simple example):
{
"name": "name-of-the-datapackage",
"resources": [
{
"path": "https://yourdomain.com/yourdata.csv",
"pathType": "remote",
"name": "remote-data-about-something",
"format": "csv",
"schema": {
"fields": [
{
"name": "number",
"type": "integer",
"format": "default"
},
{
"name": "string",
"type": "string",
"format": "default"
},
{
"name": "boolean",
"type": "boolean",
"format": "default"
}
],
"missingValues": [
""
]
}
}
]
}
Then you run the command data push
and the dataset will be uploaded on the datahub.io
To avoid typing this file manually, you can use our data-cli
tool (type command data init
in the folder where you store the data) to infer the data structure into that file, and then do needed fixes (in your case - replace the path of data files).
You can read more about datapackages structure and data-cli
tool on http://datahub.io/docs
Or I could help you, if you ping me @acckiygerman in this channel: https://gitter.im/datahubio/chat
what is the relationship between datahub and the LOD cloud (http://lod-cloud.net/)?
Personally I don't know. Probably they used our old datahub.io site as a source of their data, may be @zelima could answer?
@AcckiyGerman @lambdamusic I don't obtain any information about this.
Looking at the diagram and playing around with it a bit: in most cases, URLs are redirected to old.datahub.io. Also, it is last updated in 2017-08-22. At that point, current datahub was not live yet. I assume these guys are using old.datahub.io as a source for their project. Don't think there's more relation between them and datahub.io.
Anyway, think the best place to get correct information about stuff like this is https://gitter.im/datahubio/chat
Awesome guys. Thanks very much. I'll play with the CLI and see how far I can get.
@lambdamusic I just read instructions for publishing data from @AcckiyGerman and while it's complete and quite accurate, alternatively you could simply run this and it will get published.
data push path/or/url/to/my/file[.ext]
datapackage.json and data init
(creates datapackage.json) is something for describing your data in the best way. Eg you could include some key metadata like a description of your dataset, licence, contributors, encoding, views etc... You can read more about data package specifications here https://frictionlessdata.io/specs/data-package/#specification
Hi,
I'm part of the SciGraph project and would like to make available our data (~200G) via datahub.io.
I have two questions:
Thanks in advance!