datopian / metastore-lib

🗄️ Library for storing dataset metadata, with versioning support and pluggable backends including GitHub.
https://tech.datopian.com/versioning/
MIT License
10 stars 1 forks source link

More detail and explicitness in README #12

Open rufuspollock opened 4 years ago

rufuspollock commented 4 years ago

We could improve README a bit to make it more explicit (so easier to actually get started) plus include some design info (may want to split into 2 issues)

Acceptance

Tasks

Analysis

Questions

Example material to include

# ~rufus can we have explicit "real" options
# what about lfs config to use? should that go here?
config = {
  }

# Directly instantiate the MetaStoreBackend class:
metastore = GitHubStorage(
  lfs_server_url="https://giftless.datahub.io/",
  default_branch_name="master"
  // directly passed to PyGithub client - for details see https://pygithub.readthedocs.io/en/latest/github.html#main-class-github 
  "github_options": {
      "password_or_token": "GITHUB_API_TOKEN"
    },
  )

Example datapackage.json - in examples. This example has 2 data resources, one stored in lfs cloud storage, one that is "remote".

{
  "name": "my-data-package",
  "resources": [
    {   // resouce with data in lfs cloud storage
        // how do we know?
      "path": "data/resource1.csv",
      "sha256": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
      "lfs_prefix": "datopian/my-data-package",
      "bytes": 10240
    },
    {
      "path": "https://myremotesite.com/mydata.csv"
      // optionally more information
      ...
    },

  ]
}
import json

with open("datapackage.json") as f:
    metadata = json.loads(f)

package_info = metastore.create(package_id, metadata, message="...", author={name: email})

Now your git repo will look like XXX

.lfsconfig
.gitattributes
README.md          ???
datapackage.json
data/resource1.csv

.lfsconfig

[remote "origin"]
  # as specified in the original config for this backend
  lfsurl = https://giftless.datahub.io/

.gitattributes

data/resource1.csv filter=lfs diff=lfs merge=lfs -text

data/resource1.csv:

version https://git-lfs.github.com/spec/v1
oid sha256:2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
size 10240
shevron commented 4 years ago

I'd like to minimize the amount of content in README and put most of the documentation in the docs folder which contains Sphinx based documentation for the project, including a quick start guide. This allows for better organization of documentation and "single source of truth". It is also auto-published to readthedocs.io now.