CenterForOpenScience / SHARE

SHARE is building a free, open, data set about research and scholarly activities across their life cycle.
http://share-research.readthedocs.io/en/latest/index.html
Apache License 2.0
101 stars 58 forks source link

[schema] Dataset schema definition #20

Closed fabianvf closed 10 years ago

erinspace commented 10 years ago

Here's the specification of we have so far:

erinspace commented 10 years ago

Example - from PLoS

{
    "contributors": [
        {
            "email": "loudonj@ecu.edu", 
            "full_name": "James E. Loudon",
            "id" : {"ORCID": "add-orcid-here", "other-id": "add-other-id-here"}
        }, 
    ], 
    "id": {"url": "http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0100758", 
            "DOI": "10.1371/journal.pone.0100758"},
    "meta": {"OSF specific metadata"}, 
    "properties": {
        "PDF": "http://dx.plos.org/10.1371/journal.pone.0100758.pdf", 
        "figures": [ "http://www.plosone.org/article/fetchObject.action?  uri=info:doi/10.1371/journal.pone.0100758.g001&representation=PNG_M"], 
        }, 
    "description": "This study seeks to understand how humans impact the dietary patterns of eight free-ranging vervet monkey (Chlorocebus pygerythrus) groups in South Africa using stable isotope analysis.", 
    "tags": ["Behavior"]
    ,
    "source": "PLoS", 
    "timestamp": "2014-07-11 10:31:33.168456", 
    "title": "PLOS ONE: Using Stable Carbon and Nitrogen Isotope Compositions"
}
efc commented 10 years ago

It would be useful to include a URI in this scheme. The URI would usually be derived from the id, but it would be actionable as a link back to the resource being described.

erinspace commented 10 years ago

How does everyone feel about this general schema to get started with? We can change all current scrapers to output this normalized format for now, and perhaps come up with more detailed information as needed?

efc commented 10 years ago

@erinspace, that seems reasonable. The scheme does not have to be perfect right now, we can iterate as time passes.

I'd like to point to RIOXX as a possible guide. They are just getting comment on a new version of their scheme and I think serves as an interesting model. See RIOXX v2.0 beta 1 and note, in particular, the "dc:identifier" which requires a URI. I think the clarity of this document is something for us to strive for, though we might make different choices than they do.

erinspace commented 10 years ago

Ok, going to close this issue for now, with the understanding that we can always come back and tweak things if need be. I've edited the original schema in my first comment to reflect several discussions we've had here and in other threads - about what to include for authors, IDs, and other metadata we'd request for each consumer.