Closed bussec closed 8 months ago
We could use the following taxonomy to describe a persons role in a project:
So each of the above are replaced with Contributor
We could use the following taxonomy to describe a persons role in a project:
How would you see this working?
Contributor : {
person: URI {
label: string, (e.g. "Brian Corrie")
id: string, (e.g. "ORCID:0000-0003-3888-6495")
},
institution: URI {
label: string (e.g. "Simon Fraser University")
id: string, (e.g. "ROR:0213rcc28")
},
credit: URI {
label: string (e.g. "Data curation")
id: string (e.g. "CRT:f93e0f44-f2a4-4ea1-824a-4e0853b05c9d")
}
Example is from: https://credit.niso.org/contributor-roles/data-curation/
This would be minimal, and complete, but would require lookup every time one wanted to actually use anything.
Contributor : {
orcid_uri: string, (e.g. "ORCID:0000-0003-3888-6495")
credit: string (e.g. "CRT:f93e0f44-f2a4-4ea1-824a-4e0853b05c9d")
}
Not a fan of this...
I kinda feel if we're going to go this far in adding a Contributor
object, we should go all the way and allow for any number of contributors to be attached to a study. Thus, not restrict ourselves to only those 3 categories (study contact, collected, submitted).
Plus, there's a little bit of disconnect between the contributor roles in CREDIT and what MiAIRR wants. CREDIT roles indicate what contribution that people made, but none of the roles exactly match the roles we want, i.e. a study contact, the person who collected the data and is legally response for the data, and the person who submitted the data. The "Data Curation" role in CREDIT is roughly similar to the data submitter, but nothing really matches for the other two.
Yes, the credit and contact purposes are somewhat different. The MiAIRR fields are contact fields more than credit fields. If you want to ask someone about sample prep, ask the collected_by person, curation ask submitted_by, and about the study in general the study_contact.
These aren't really providing credit - and maybe MiAIRR doesn't need to?
These aren't really providing credit - and maybe MiAIRR doesn't need to?
I think not. The Contributor
object is to standardize person information, that makes sense but I don't think CREDIT roles are really necessary.
However, one limitation with the current design is only one person can be assigned. There are sometimes multiple study contacts, and there are often multiple data submitters. For example, when Kira did metadata curation, and I did the data processing, I'd like to list both of us as the data submitters so we both get credit...
@bcorrie The "long" version of the Contributor
record looks good to me. I also agree that we should have an array of Contributor
objects in a study, so that proper credit can be provided.
Regarding the "contact" roles defined in MiAIRR: I also would consider them to be only weakly correlated with the credit information. I could think of two ways to combine them:
Contritbutor
record contains study_contact
, collected_by
and submitted_by
as boolean fields, so you can flag the respective person.study_contact
, collected_by
and submitted_by
as properties of Study
, but they contain an index to the respective record in the Contributors
array.This ontology has a more complete set of roles, though I'm not sure if anything actually matches as contact (supervisor role?). Though maybe we can request a term...
@bcorrie The "long" version of the
Contributor
record looks good to me. I also agree that we should have an array ofContributor
objects in a study, so that proper credit can be provided.
Agreed, also credit
should be an array so that multiple roles can be assigned.
- A
Contritbutor
record containsstudy_contact
,collected_by
andsubmitted_by
as boolean fields, so you can flag the respective person.- We keep
study_contact
,collected_by
andsubmitted_by
as properties ofStudy
, but they contain an index to the respective record in theContributors
array.
or 3. We designate/document 3 terms from this ontology for those roles, for example:
study_contact
== supervision role ?collected_by
== collection role ?submitted_by
== submitter role ?@schristley Now looked at this in more detail: CRO is a complete superset of the CRediT taxonomy, the terms even have the same names and definitions. So we get the additional term we need for free when using it and it is simpler to integrate as it is an OBO Foundry ontology as well.
Where should the array containing the Contributor
records be located in the schema? Is it rather
Study
object?@williamdlees Would either of these work for the Germline Acknowledgements
? Or would the Contributor
records need internal IDs for referencing?
(1) should work. At the moment there is a top level Acknowledgement object which is used by AlleleDescription and GermlineSet. It has an acknowledgement_id but it's not used.
- a property of the
Study
object?
I'd probably prefer this to keep it simple, without needing to create identifiers, worrying about uniqueness and so forth. This also allows the contributor role to be different for study versus germline.
Potential structure...
Contributor : {
contributor_pid: string, (e.g. "ORCID:0000-0003-3888-6495")
name: string,
email: string,
affiliation_pid: string,
affiliation_name: string,
affiliation_address: string,
roles: array
}
do we need to allow multiple affiliations?
do we need to allow multiple affiliations?
ORCID handles those complexities, maybe we can state that this is the primary affiliation that one has with this study and defer to ORCID for complex relationships and affiliations.
Regarding affiliation_address
: Are we ok with <city>,[state,]<country>
? This information is already in ROR, so we could just pull it from there. Or is anyone still receiving physical mail these days? :-)
Does it make sense to use PID id/label pairs as I did in: https://github.com/airr-community/airr-standards/issues/552#issuecomment-1016809146
This provides more consistent use with other URI based PID objects in the standard. We might do this for the person and institution (e.g. affiliation.id
, affiliation.label
) rather than have custom fields (affiliation_pid
, affiliation_name
) for those objects.
Does it make sense to use PID id/label pairs as I did in: #552 (comment)
This provides more consistent use with other URI based PID objects in the standard. We might do this for the person and institution (e.g.
affiliation.id
,affiliation.label
) rather than have custom fields (affiliation_pid
,affiliation_name
) for those objects.
Are we requiring that all contributors have an ORCID? And if they don't, then how should their information be recorded?
Regarding
affiliation_address
: Are we ok with<city>,[state,]<country>
? This information is already in ROR, so we could just pull it from there. Or is anyone still receiving physical mail these days? :-)
Yes, that is probably okay. What about department name(s)?
I think it's reasonable that we don't expect this to be complete and thorough contact information, but enough information that somebody could find the person with some extra googling? On the other hand, if it's a legal contact then it likely needs to be as specific as possible.
What about department name(s)?
This is currently beyond the scope of ROR. They are working on this, but I don't expect this to happen any time soon. So either we have a free text field for this or skip the information altogether.
Are we requiring that all contributors have an ORCID?
If we are using these fields as ID/label pairs this would be a consequence of it (as the IDs must resolve to the label or a synonym for it).
Are we requiring that all contributors have an ORCID?
If we are using these fields as ID/label pairs this would be a consequence of it (as the IDs must resolve to the label or a synonym for it).
Right. That's why I avoided the Ontology ID/label with my suggested structure. That is, the ORCID (or other PID) can be provided if it's available but it isn't required.
Some further points, based on a couple of RDA-DE talks today and yesterday:
The following point was moved here from #530:
Currently we have multiple data structures in the schema that refer to people contributing to a study (ex. study subjects):
Acknowledgement
object (has properties: ID, name, institution, ORCID; all [string])Study
.study_contact
property [string]Study
.collected_by
property [string]Study
.lab_address
holds the institutional information for the person incollected_by
.Study
.submitted_by
property [string]