ResearchObject / ro-crate

Research Object Crate
https://w3id.org/ro/crate/
Apache License 2.0
88 stars 36 forks source link

Use Case: I want to be able to encode a person or organisation's role. #79

Open marcolarosa opened 4 years ago

marcolarosa commented 4 years ago

As a type of user, I want some goal so that some reason. As the developer for the PARADISEC project (languages) I want to be able to encode the role a person had in relation to the data described in the crate. Specifically, I want to use my own controlled vocabulary of roles and my roles apply to the crate as a whole, not to pieces of it.

jmfernandez commented 4 years ago

Maybe it is a bit overkill using CRO ontology (https://github.com/data2health/contributor-role-ontology), but it should be considered.

stain commented 4 years ago

Contributor roles

Several efforts already exist to make taxonomy of common contributors roles in scholarly work. The most common one by journals is perhaps the CASRAI CReDiT roles used in JATS:

There are URIs for these in https://jats4r.org/credit-taxonomy, e.g. https://dictionary.casrai.org/Contributor_Roles/Project_administration so we should be able to add these to the RO-Crate context, however you may notice that unhelpfully these URIs don't resolve to a readable page at all, so following our own advice we would still need to add additional documentation links to the RO-Crate.

@jmfernandez links to CRO ontology which formalize this extends with lots of useful Research terms, e.g. http://purl.obolibrary.org/obo/CRO_0000068 is "conservator role; A person responsible for the preservation of artistic and cultural artifacts." - I think these could be good for the many uses of RO-Crates.

Unlike casrai.org, CRO have also made their URIs go to a textual page, but they have also gone for this OBO style names like CRO_0000068 that have almost no meaning without resolving the URL, so from the JSON-LD they would be almost meaningless. For rendering and humans they would therefore also need some textual declaration within the RO-Crate.

stain commented 4 years ago

The SKOS method with ad-hoc properties in #71 could help for other ad-hoc roles not in CRO.

That would allow roles to be both defined and referenced within the RO-Crate - potentially later moved out if used many places.

That leaves how to relate the role to the person and the crate (or one of its resources), which I think https://schema.org/Role or https://schema.org/OrganizationRole can be used as an intermediary for all properties - see http://blog.schema.org/2014/06/introducing-role.html which describes this intermediate node concept well. Here https://schema.org/roleName can be both a URL to a concept or free-text - so we don't necessarily need to SKOS anything ad-hoc (unless the roles already come from a 3rd party taxonomy).

We could document this for https://schema.org/contributor which is probably where most of the arbitrary roles will work ("illustrator"), and secondly for organizational roles ("project manager") which has more to do with a person's affiliation.

Detailing contributor roles with schema.org

Multiple contributors can take on multiple roles in forming a creative work.

To specify a role, break up contributor with an intermediate Contextual Entity of type Role that again links on with contributor to the individual Person or Organization. Note that one individual may take part in multiple roles, but each role goes to just one person.

The role is specified using roleName. For academic work, RO-Crate recommends using the CASRAI Contributor Roles Taxonomy (CRediT) and/or the Contributor Role Ontology (CRO). Free-text roles can be used as fall-back when no specific term is available. Multiple roleName identifiers can be included for a particular Role entity, but should each describe (in a broad sense) the same kind of role.

{
    "@context": ["http://schema.org/", 
      {
        "credit": "https://dictionary.casrai.org/Contributor_Roles/",
        "cro": "http://purl.obolibrary.org/obo/CRO_0000068"
      }
    ],
    "@graph": [
     {
      "@id": "patients_report.pdf",
      "@type": "CreativeWork",
      "name": "Report and diagrams of patient admissions",
      "author": {"@id": "https://orcid.org/0000-0002-1825-0097"},
      "contributor": [
          {"@id": "#af1bf5db-96f7-4143-b420-41b7ca1a4052"},
          {"@id": "#b3b04f6c-526d-41c3-a9e0-ded8bb1bbfc9"},
          {"@id": "#bf768c8f-acdc-448d-9a17-76eb19bc6caa"}
        ]
     },
     {
        "@id": "https://orcid.org/0000-0002-1825-0097",
        "@type": "Person",
        "name": "Josiah Carberry"
     },
     {
        "@id": "https://orcid.org/0000-0000-1234-5678",
        "@type": "Person",
        "name": "Alice W Land"
     },
     {
        "@id": "#af1bf5db-96f7-4143-b420-41b7ca1a4052",
         "@type": "Role", 
         "contributor": "https://orcid.org/0000-0002-1825-0097",
         "roleName": [
             "original draft preparation", 
             {"@id": "credit:Writing_original_draft"},
             {"@id": "obo:CRO_0000088"}
        ]
     },
     {
        "@id": "#b3b04f6c-526d-41c3-a9e0-ded8bb1bbfc9",
         "@type": "Role", 
         "contributor": "https://orcid.org/0000-0000-1234-5678",
         "roleName": [
             "making figures", 
             {"@id": "obo:CRO_0000003"},
             {"@id": "credit:Visualization"}
        ]
     },  
     {
        "@id": "#bf768c8f-acdc-448d-9a17-76eb19bc6caa",
         "@type": "Role", 
         "contributor": "https://orcid.org/0000-0000-1234-5678",
         "roleName": [
             "data collection", 
             {"@id": "obo:CRO_0000036"},
             {"@id": "credit:Investigation"}
        ]
     }
   ]
}

In the example above, we see Josiah (ORCID 0000-0002-1825-0097) have a role, writing the original draft (also shown directly as an author).

There are two more contributor roles, both held by Alice (ORCID 0000-0000-1234-5678):

TODO: Do we really want to support both? CRO might be better as it still is mappable to the more shorter/readable credit.

Ad-hoc roles can be provided textually for more specific roles, which may not be consider academic but have nevertheless contributed:

     {
        "@id": "#f1c16a15-4d9c-4546-b1f1-483e4f899bfc",
         "@type": "Role", 
         "contributor": "https://orcid.org/0000-0000-1234-5678",
         "roleName": "quadcopter drone pilot"
     },  

If the contributor role of a person is unknown, then the contributor property from a CreativeWork should link directly to the Person instead of an intermediary Role.

stain commented 4 years ago

I've raised https://gitlab.com/JATS4R/credit-taxonomy/-/issues/8 with the CReDiT people, I think their URLs used to work two years ago.

stain commented 4 years ago

Organizational roles

It may be important to highlight the roles of individuals within organizations they are affiliated with. Consider for instance a report published by a Director compared to another from a summer Intern. Declaring membership in other organizations can also be important for being open about potential Conflic of Interest situations.

For this RO-Crate recommends using an intermediary OrganizationalRole contextual entity at the memberOf from a person. It is RECOMMENDED to represent the direct affiliation to the main organization/employer in parallel:

{
    "@context": "http://schema.org/",
    "@graph": [
     {
        "@id": "https://orcid.org/0000-0002-1825-0097",
        "@type": "Person",
        "name": "Josiah Carberry",
        "affiliation": "#brownUniversity",
        "memberOf": [
            {"@id": "#6adc2ffa-3260-4642-9408-609100a1b7c6"},
            {"@id": "#c4676ff7-dd65-41c4-a4f9-43784e69933c"},
            {"@id": "#0c14fb64-197b-4c46-ab6e-86cd3d86f01e"}
        ]
     },
     {
        "@id": "#brownUniversity",
        "@type": "Organization",
        "name": "Brown University"
     },
     {
        "@id": "#bigPharma",
        "@type": "Organization",
        "name": "Big Pharma Ltd."
     },
     {
        "@id": "#6adc2ffa-3260-4642-9408-609100a1b7c6",
        "@type": "OrganizationRole", 
        "memberOf": "#brownUniversity",
        "roleName": "Professor",
        "startDate": "1929",
        "url": "https://library.brown.edu/info/hay/carberry/"
     },
     {
        "@id": "#c4676ff7-dd65-41c4-a4f9-43784e69933c",
        "@type": "OrganizationRole", 
        "memberOf": "#brownUniversity",
        "roleName": "President of Josiah S Carberry Fund",
        "startDate": "1955-05-13"
     },
     {
        "@id": "#0c14fb64-197b-4c46-ab6e-86cd3d86f01e",
        "@type": "OrganizationRole", 
        "memberOf": "#bigPharma",
        "roleName": "Board Member"
     }
   ]
}

In this example we see that Josiah has two roles for his main affiliation at Brown University. In addition Josiah have declared being a Board Member of a commercial organization.

On the Role contextual entity the memberOf link goes on to the actual organization the person is a member of. startDate and endDate may be added to specify historical roles and positions that are relevant to declare, and url can provide a link documenting that particular engagement.

stain commented 4 years ago

I think we should restrict where you may expect a Role - by Schema.org they can appear almost anywhere, which is not so helpful for developers.

ljgarcia commented 4 years ago

Be aware that schema.org also allows using Role type as a statement on a property, see http://blog.schema.org/2014/06/introducing-role.html I would avoid that usage (see Stian's comment about helpfulness for developers) but thought is was better to be aware of.

On Thu, Jun 4, 2020 at 3:39 PM Stian Soiland-Reyes notifications@github.com wrote:

I think we should restrict where you may expect a Role - by Schema.org they can appear almost anywhere, which is not so helpful for developers.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ResearchObject/ro-crate/issues/79#issuecomment-638853707, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJNANCII6JOV2KLDPZRNU3RU6PYPANCNFSM4NJ4QW5A .

marcolarosa commented 4 years ago

A few comments as the developer of Describo (these comments are biased from working to implement this linked data structure into an easy to use GUI tool):

I also agree with Stian's comment about limiting where a Role should be expected.

In our OCFL / RO-crate POC for PARADISEC we implemented the role as follows:

"contributor": [
    {
      "@id": "#1",
      "@type": "Role",
      "name": "collector",
      "contributor": { "@id": "#2 }
    },
    {
        "@id": "#2",
        "@type": "Person",
        "name": "..."
    },

We followed the blog post Stian referenced but we used name instead of roleName in the POC though this is a trivial update on our part and I think roleName makes the value of the property clearer in that context.

Note also that in our case we use a text value without references to external ontologies. Not only is this easy to implement in code but it also makes sense from the perspective of the users for whom that role means something. Mapping our roles to an external ontology would make the data more academically rigorous at the expense of added complexity for my users.

I wanted to point that out as our initial testers of Describo are already indicating that the tool is easy to use but they're misusing it in places because they don't know about the underlying spec or graph. It's almost a double edged sword - the tool is useable by novices but it needs much more code to ensure they ultimately create a sensible crate without needing to know about the spec; which they probably won't read. I hope that makes sense.

So, the more complex this spec becomes the harder it will be to keep the tool easy to use unless the implementation is as Stian notes: A role could be a reference to something external or it could be a simple text value. Both are ok.

stain commented 4 years ago

Thanks, @marcolarosa - I agree and I think a tool like Describo probably have to be even more prescriptive than the specifications to lead people on the right path - the multi-layer profiles can help with that.

It would have been good to have a different property for the controlled vocabulary instead of overloading roleName with a mix of strings and URIs which would also become difficult to render and order in UIs.

I think the loose additionalType is well suited for that. - not identifier and so on, because a Role object is representing that Something took up SomeRole (at SomeTime) - and multiple times can many take up the same type of role, which would be new Role instances.

Schema.org has no vocabulary for organizing hierarchies of Role, so it would be wrong to have say a generic Creator instance of Role - rather that is a particular subtype of Role.

To allow linking to various controlled vocabularies that know nothing of schema.org/Role, and indeed where those identifiers might be described as properties rather than classes, then using the loose additionalType makes (to me) more sense although http://schema.org/roleName do formally permit URL. We can then link this to #71 although because of roleName having text that would only be needed if the hierarchy of roles was important or pre-existing.

ljgarcia commented 4 years ago

additionalType seems a good option. It would be nice if it accepted DefinedTerm as range rather than only URL.

jasonclark commented 4 years ago

Community meeting notes show that this is delayed until after 1.1 release. @ptsefton noted on July 2020 call that person or organization role profiles will need concrete use cases and could be widely interpreted across domains. In the meantime, work here could include listing use cases or examples of person roles or organization roles. @stain has sample encoding above using CReDIT and Schema.org, but we could continue to formalize as the next step.

marcolarosa commented 3 years ago

Based on discussion at today's meeting.

I reported that I've been reworking the paradisec to ro-crate export and as part of that I've modeled a person's role as a link from a role property on the person:

{
  @id: ...,
  @type: 'Person',
  name: ...,
  role: {
    @id: ....,
    @type: 'Role',
    name: 'performer'
  }
}

And each person is listed as a contributor as people in the PARADISEC case are contributors to the data who has a specific role.

ptsefton commented 3 years ago

@marco - how does that role link to the Dataset or File in question can you show a complete example with the contributor etc?

marcolarosa commented 3 years ago

Here's an example for a PARADISEC item - it's abridged and shown before flattening but you'll get the gist:

{
  @id: './',
  @type: [ 'Dataset', 'RepositoryItem' ],
  name: '....',
  contributor: [
    { @id: '...', @type: 'Person', name: 'Marco', role: [ 
             { @id: '#collector', @type: 'Role', name: 'collector' }, 
             { @id: '#operator', @type: 'Role', name: 'operator' }
    ]},
    { @id: '...', @type: 'Person', name: 'Peter', role: [ 
             { @id: '#operator', @type: 'Role', name: 'operator' }
    ]},
    { @id: '...', @type: 'Person', name: 'Nick', role: [
            { @id: '#collector', @type: 'Role', name: 'collector' },
            { @id: '#performer', @type: 'Role', name: 'performer' }
    ]},

  ] 
} 
marcolarosa commented 3 years ago

This is my last contribution to this thread as I think enough words have been spilled on the matter... :-)

@ptsefton and I have had numerous conversations on this matter and it seems to come down to an ability to model roles in simple crates like PARADISEC vs more detailed ones.

By simple I mean that a person is encoded at the crate level where they have multiple roles in relation to the whole crate (as per the structure in the comment https://github.com/ResearchObject/ro-crate/issues/79#issuecomment-808983453).

We are not encoding person A with role B on file C vs person A with role X on file Y. In this case I appreciate that this way of modelling will result in multiple instances of that person within the crate. And I appreciate that ro-crate should recommend a different way of modelling for that use case.

My request to the community is to support both styles. For simple cases like mine allow an implementer to add a role property to the person which encompasses all of their roles in that crate as a whole. For the more complex crates then an implementer should do it {the agreed upon way} so that the crate does not end up with multiple copies of the same person.