ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
59 stars 13 forks source link

publications linked to specimens automatically added to projects #893

Closed campmlc closed 8 years ago

campmlc commented 8 years ago

I have come across several examples where specimens are linked as vouchers to publications, but these publications do not show up in the related projects that collected or used the specimen. Shouldn't this linkage be made automatically? Thoughts? @jldunnum @ccicero

dustymc commented 8 years ago

Example?

campmlc commented 8 years ago

http://arctos.database.museum/guid/MSB:Mamm:94040

The specimen was cited by papers resulting from specimen loans, but the citations do not show up in the projects that collected the specimen or used the specimen.

On Fri, Jun 10, 2016 at 5:41 PM, dustymc notifications@github.com wrote:

Example?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/893#issuecomment-225320551, or mute the thread https://github.com/notifications/unsubscribe/AOH0hIqUAo1yV1u3pABnPukLQMyG8gXaks5qKfY9gaJpZM4IzZAL .

dustymc commented 8 years ago

That specimen has been loaned only once, to Andrew.

I can think of only two ways it got into the not-Andrew publications - the curatorial folks didn't create a loan (pretty clear here - a citing paper was published 5 years before the only loan!), or someone is sharing tissues (=undocumented usage) - both conditions most Curators will want to know about and rectify. (And there are lots of tools to assist with that under Reports.)

I can see no reason collecting projects should display usage publications.

campmlc commented 8 years ago

I would think it important that an NSF funded project that collected specimens could document the usage of those specimens as part of project reporting and justification. Anyone else have an opinion on this? On Jun 11, 2016 8:31 AM, "dustymc" notifications@github.com wrote:

That specimen has been loaned only once, to Andrew.

I can think of only two ways it got into the not-Andrew publications - the curatorial folks didn't create a loan (pretty clear here - a citing paper was published 5 years before the only loan!), or someone is sharing tissues (=undocumented usage) - both conditions most Curators will want to know about and rectify. (And there are lots of tools to assist with that under Reports.)

I can see no reason collecting projects should display usage publications.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/893#issuecomment-225365885, or mute the thread https://github.com/notifications/unsubscribe/AOH0hNVTESvWUv62W_2yXtD5UN-PdJPhks5qKsaRgaJpZM4IzZAL .

dustymc commented 8 years ago

I would think it important that an NSF funded project that collected specimens could document the usage of those specimens as part of project reporting and justification

That's about half of what projects do, but I still see no reason to display hundreds/thousands of publications on a project that only collected. The data you're asking about are available, you can put them in your reports if you want, I don't think they belong on the project page.

campmlc commented 8 years ago

I would disagree, because projects are the one central location for tracking the results of funded collecting efforts. It would very useful to be able to show funding agencies how the specimens have been used over time for anticipated and unanticipated uses- loans, and pubs and other projects, broader impacts etc. On Jun 11, 2016 8:47 AM, "dustymc" notifications@github.com wrote:

I would think it important that an NSF funded project that collected specimens could document the usage of those specimens as part of project reporting and justification

That's about half of what projects do, but I still see no reason to display hundreds/thousands of publications on a project that only collected. The data you're asking about are available, you can put them in your reports if you want, I don't think they belong on the project page.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/893#issuecomment-225366991, or mute the thread https://github.com/notifications/unsubscribe/AOH0hBDiWL-KkFZT7hfVDEm3zdjK2Ptxks5qKspOgaJpZM4IzZAL .

dustymc commented 8 years ago

Here's the view from here - hopefully it'll help you understand whatever I don't understand, because I'm no longer sure we're talking about the same thing.

There are three "core" things that projects directly do (and a bunch of maybe-not-so-core things like Media and Taxonomy):

1) project-->accn - these are collecting or acquisition, often created for things like NSF-funded collecting. 2) project-->loan - these document "usage" through loans 3) project-->publication - these document the results of "usage" in two ways - when a loan produces "hard" citations (yay everybody), and when a loan has to some degree of certainty lead to a publication which doesn't actually cite specimens (activity that can only be documented in Arctos, as far as I know)

Any project can include any, all, or none of those activities. A "project that only collected" would be linked to one or more Accessions, but no Loans (yet).

In our data model, accessions produce specimens and loans (potentially) produce results. A publication entitled "I'm checking my traps right now!" is usage/results - there should be a loan. The fact that the usage happens concurrently with (or before or anything else) collecting is irrelevant; the specimens (or critters that eventually became specimens, or whatever) are being used; something of value is being added to their specimen data (and so to their potential for further use) through the documentation (=publication).

"Broader impacts" are documented through the relationship of projects to projects (including "self" - a project can of course collect and borrow) via transactions. Say ProjectA collects ABC:XZY:123, ProjectB borrows ABC:XZY:123 and cites it (or we can somehow figure out they used it) in SomePublication. ProjectB will show up on ProjectA's "Projects using contributed specimens" list (and ProjectA on ProjectB's "projects contributing..."), and SomePublication will show up under ProjectB's Publications list. I think you're asking to add SomePublication to ProjectA's Publications list, and as a user that would lead me to believe that ProjectA had BORROWED ABC:XZY:123. It didn't, it COLLECTED the specimen and "supported" ProjectB through that activity.

One thing that may be relevant but is missing from the model is that citations are linked only to publications. I don't think that's how I'd have designed things - I'd like to see an additional link between citations and loans so we could tighten up the loan-->publication pathway - but that model would have no way to accommodate http://arctos.database.museum/guid/MSB:Mamm:94040 appearing in http://arctos.database.museum/publication/1000190 with no obvious loan in between either - it would require retroactively creating the loan (and in turn would allow deeper/more precise questions of the data).

One possibility is alter the "Projects used specimens contributed" display. Here's an possible alternate view of that (using BCP).

Projects using contributed specimens

projcnt: 60 pubcnt: 46 citcnt: 2487

Does that address this issue? (And the query runs in ~2 seconds, so probably technically not a big deal to make that change.)

jldunnum commented 8 years ago

I think your alternate view above is pretty good. The main requirement here is too remove all extra manual linkages from the process. Everything that is related needs to autolink when appropriate.

  1. Specimen accession is manually linked to contributing project.
  2. When cataloged, specimens linked to accession (and thus original project).
  3. Any subsequent loans and new projects using specimens should autolink back to original contributing project
  4. Any publication containing a cited specimen from that accession should also autolink to both the new and original project regardless if there is a loan record for it.

The main manual workload and hurdle for us is doing the specimen to publication linkage. Hopefully some of this will be reduced once journals are required to link to specific specimens. We need to have this all in place to show them our model.

dustymc commented 8 years ago

Any subsequent loans and new projects using specimens should autolink back to original contributing project

They do/always have IF you have complete loan records. (Data Loans are handy, if not entirely accurate, for quickly creating retroactive loans when eg, you stumble upon a citation.)

Any publication containing a cited specimen from that accession should also autolink to both the new

I don't see how this can work - projects exist because specimens get used in lots of ways. The "Something about DNA" project having a "nothing at all molecular in here" paper included is just going to confuse everyone. I could certainly provide some type of alerts or similar, but I don't see an automation pathway in there.

and original project

I still feel like I'm missing something significant in this. It does link (via related projects, as seen in my "possible alternate view" above), it does not DIRECT link because that would 1) introduce unnecessary denormalization (unless there's some new pathway that I'm not seeing), and 2) dilute the meaning of any original linked publications. Should we take this to a voice call?

once journals are required to link to specific specimens.

That would be ideal, although it seems a bit unlikely from here. There are a lot of tools in Arctos which can help you figure out the folks who might need some extra encouragement to cite specimens, or find publications which probably should but do not cite, etc.

See also https://arctosdb.org/how-to/cite-specimens/ - HOW specimens are cited matters. (And I think we should have a "cite this thusly" column in downloads etc., I just have no idea what we'd put in it!)

With a DOI citation we can use existing tools (CrossRef, DataCite) to find them with some certainty, and when/if things like FundRef get some momentum we can follow the citations the other way, to people, NSF funding, etc., etc., etc. From an automation standpoint these are highly preferred.

With a GUID (=specimen URL) citation, I can probably find most things that are available on the web, and anyone stumbling upon the citation will have a 100% chance of following the link to the right specimen, even without a computer to resolve them. They're unlikely to ever make sense to things like CrossRef, which limits how we can interact with that ecosystem.

With a DWC Triplet citation (MSB:Mamm:1234) I might occasionally detect a citation, and anyone stumbling upon the citation might end up in Arctos (where they might tell you about the missing link), but the DWC triplets are far from unique so that's all pretty dicey (for machines - they're probably mostly good enough for people, who can understand the context).

With anything less-structured than those, I'm absolutely not going to be able to automate anything; you're going to have to manually track them down, and users (of the publication) are left with something closer to clues than links.

Anything you can do curatorially to encourage/require the use of resolvable identifiers is likely to pay off in detecting citations or generally adding automation.

jldunnum commented 8 years ago

Any publication containing a cited specimen from that accession should also autolink to both the new

I don't see how this can work - projects exist because specimens get used in lots of ways. The "Something about DNA" project having a "nothing at all molecular in here" paper included is just going to confuse everyone. I could certainly provide some type of alerts or similar, but I don't see an automation pathway in there.

Sorry, not saying whenever any specimen from an accession is cited all papers using material from that accession are linked to that new specific project. I mean when a specimen from that accession is cited, that paper is linked to the new project but also linked to the original overriding collection project. Maybe a phone discussion would be best so we all are on the same page.

dustymc commented 8 years ago

also linked to the original overriding collection project

If you mean "by way of 'usage' projects, something like the 'possible alternate view' above," then I think we're all on the same page and just need to sort out the details.

If you mean anything else, let's schedule a phone call.

jldunnum commented 8 years ago

I think we are on the same page but wouldn't hurt to chat anyways. Maybe later this week?

dustymc commented 8 years ago

New code running in prod - re-open or give me a call if I've missed the issue.