geneontology / noctua

Graph-based modeling environment for biology, including prototype editor and services
http://noctua.geneontology.org/
BSD 3-Clause "New" or "Revised" License
38 stars 12 forks source link

Add default user group/funding information to model, with switching options in the interface #350

Closed kltm closed 8 years ago

kltm commented 8 years ago

This is that "hats" concept that floats up occasionally (that I thought was already captured somewhere, but I have been unable to find).

While the ORCID-per-annotation model gets us a long way, individuals work for different entities over time, and at the same time, so there needs to be some method not only capturing who did something, we also have to capture what "hat" they were wearing when they did it.

The noctua bit of this, besides proper message passing, would be a dropdown of available hats for the logged-in user.

Somewhat related to #347, as the same kind of group information is a prerequisite. The group/funding information would likely need to be partially kept in users.yaml, with maybe an ontology or reference IDs for group/funding entities (TBD, and maybe an item for geneontology/go-site).

cmungall commented 8 years ago

One option is to use VIVO/GRID: https://grid.ac/downloads

Not clear if this best. It's large, not clear how easy to add, and the entities we care about are not just institutions. We also want to give provenance to specific projects (typically funded).

We will most likely maintain our own list of organizations as a yaml that can be easily updated.

-
 id: https://www.ucl.ac.uk/functional-gene-annotation/cardiovascular
 label: Cardiovascular Gene Annotation
-
  ...

with the obvious json-ls/ttl translation. We can add owl:sameAs for GRID IDs if required.

@mellybelly thoughts?

We will use these using our ugly entity-as-literal hack for now (cc @balhoff ), so the ttl would be:

  dc:??? "https://www.ucl.ac.uk/functional-gene-annotation/cardiovascular"^^xsd:String
kltm commented 8 years ago

@cmungall I note that db-xrefs.yaml does not currently specify "id", but "name" and "label" serve the same function. Do you think overloading is worth while, or should we kick it out a bit? For users.yaml, I'd propose a new field "organizations", that would replace the current "organization" that would be a list of "id"s found in the file.

cmungall commented 8 years ago

this is more analogous to users.yaml (in fact we previously discussed overloading this, as used to be done for ontology provenance). We could use uri for consistency.

There may be value to keeping one organization as prime/current in the users.yaml even if noctua doesn't use it. We could treat membership as it's own entity

 nickname: Midori
 roles:
  -
   type: member
   organization: <pombase-uri>
   start: ...
   end: ..
  -
    ...

overmodeling vs future proofing? Not clear.

kltm commented 8 years ago

I'm not sure I follow: for roles.type, would there be anything besides member in users.yaml? If not, why have the extra information? Also, I'm not sure start/end would be necessary, and would add a lot of maintenance cruft/overhead we wouldn't want to become long-term maintainers of that data. Ideally, we'd know functionally when people were part of an org by the edit times on their operations...which actually is bringing me around on another point that you made...

(For example, we can currently see date and contributor annotations to an individual, but if there are multiple edits to a single individual by multiple people at different time, there is no way to see what was done by who/when. The only way around that would be to spin out (at least) contributor as an individual and then have the date annotations there. Narf.)

cmungall commented 8 years ago

On 13 Sep 2016, at 15:45, kltm wrote:

I'm not sure I follow: for roles.type, would there be anything besides member in users.yaml? If not, why have the extra information? Also, I'm not sure start/end would be necessary, and would add a lot of maintenance cruft/overheadwe wouldn't want to become long-term maintainers of that data. Ideally, we'd know functionally when people were part of an org by the edit times on their operations...which actually is bringing me around on another point that you made...

It's at least a natural way to model current/vs not current which is useful, but YMMV

(For example, we can currently see date and contributor annotations to an individual, but if there are multiple edits to a single individual by multiple people at different time, there is no way to see what was done by who/when. The only way around that would be to spin out (at least) contributor as an individual and then have the date annotations there. Narf.)

I'm fine with just a bundle of flat triples here. Anything else would be very complex, not well aligned with other attribution models, and not really required.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/geneontology/noctua/issues/350#issuecomment-246849873

kltm commented 8 years ago

Imagine the query: get me all models I worked on for the Heart Foundation for the 2015 fiscal year. Without something richer, we'll be unable to get near anything like that. At least, we'd probably want to store both creation and modification, which would get us a lot more resolution, but still fall short of getting good granular reporting.

kltm commented 8 years ago

As a light first pass, just to have something to play with, how about a files roles.yaml as you described in https://github.com/geneontology/noctua/issues/350#issuecomment-246841631, and an additional (optional?) field for users.yaml: "roles", that references that file?

kltm commented 8 years ago

Final discussion: groups.yaml, and a "groups" listing in users.yaml, the latter being optional for the time being.

mellybelly commented 8 years ago

Remember ORCID is only an instance identifier. We really need a plan for organization instances too, they are all over the place. GRID is the best out there after 10+ years. They are curating, are collaborative, and coordinating with Wikidata. That said, it seems like you have some non-org types of groups. 'Cardiovascular Gene Annotation' is not really an organization or even a group, its more of a focused effort consisting of a narrower group of GO curators? These probably wouldn't need to go into GRID and could live somewhere else with a landing page/grey lit citation, but if they are really orgs like 'EBI' then they should go into GRID. owl:sameAs for GRID IDs is fine, but please help populate GRID if you can.

I like the yaml model to represent people w different "hats". Its not really so different than the implementation in vivo-ISF, where you can have different roles for different lengths of time in different organizations. Agree here with @kltm though, I don't think you need to declare when they worked in an org, this information is available elsewhere for many people and isn't really all that relevant here.

Regarding other roles: will there be curation roles? QA roles? evaluation roles? New contribution roles can go into the new contribution ontology @marijane, she can also advise on above. Send some more examples.

This is important, as I want to be able to aggregate curation contributions for the ISB website on a per person and/or per org, and/or date range basis.

kltm commented 8 years ago

Currently looking at maybe using "contributor" or "publisher" for groups, but don't have a real good feel for that: "publisher" would be easier for queries down the road, but may be wrong, "contributor" would slot in easier to what we have, and make a certain sense, but disambiguating would be harder. @cmungall will have a think about this.

cmungall commented 8 years ago

I would have thought that PROV might help us here. E.g. PROV-O Primer

However, prov:actedOnBehalfOf seems to assume that people are time-sliced, e.g.

img

This doesn't help us much...

cmungall commented 8 years ago

@dosumis and the ontology group also want a primary contact or contacts for each group: https://github.com/geneontology/go-site/issues/231

dosumis commented 8 years ago

Just checking in to see if any progress here. Anything I can do to help?

kltm commented 8 years ago

https://github.com/geneontology/go-site/issues/231#issuecomment-253906140 The first step will be merging the data. After that, the changes to make sure that the data is picked up in the API, then I'll coordinate with @balhoff to make sure that is round-trips correctly.

kltm commented 8 years ago

@cmungall You told me to poke you about this at Monday meeting. Any last thoughts before I flip a coin? See https://github.com/geneontology/noctua/issues/350#issuecomment-247724934 .

cmungall commented 8 years ago

Summary so far:

We should have an answer shortly

cmungall commented 8 years ago

Use

http://purl.org/pav/providedBy

docs here: http://pav-ontology.github.io/pav/index.html#http://purl.org/pav/providedBy

kltm commented 8 years ago

Do we want to switch what we currently use to PAV while we're at it?

cmungall commented 8 years ago

no, dc is fairly standard, just doesn't have the granularity we need here

kltm commented 8 years ago

@balhoff I've taken a look at the different ways of injecting the group information into the request stream, and I now think the least awkward would be to add it at the request set packet level, rather than at the level of each operation--just like the _uid_s.

The structure and flow would be almost the same as a uid. This would be an optional list of group id strings that would round tripped and applied to the operations almost exactly like the uid, except optional and possibly a cardinality greater than one.

balhoff commented 8 years ago

@kltm did you start any Minerva work on this? I did a preliminary look through code using the uid so I think I see where to start. Don't want to duplicate efforts though.

kltm commented 8 years ago

@balhoff No, I did not. I started on it when I initially thought we'd add it from the client, but having it threaded in on the server side (as is uid) makes a lot more sense and would keep things significantly cleaner.