ArctosDB / documentation-wiki

Arctos Documentation and How-To Guides
https://handbook.arctosdb.org
GNU General Public License v3.0
13 stars 13 forks source link

GBIF and iDigBio - reciprocal links? #86

Open Jegelewicz opened 5 years ago

Jegelewicz commented 5 years ago

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe. It appears that both GBIF and iDigBio provide stable IDs to which we can link our specimens. Is there any reason we can't work with them to create reciprocal links as we do with GenBank?

Describe the solution you'd like When GBIF or iDigBio assigns an ID to an Arctos specimen, that ID ends up in other IDs in my Arctos record and the catalog number in GBIF or iDigBio will link to the record in Arctos (this link already works from GBIF and iDigBio but it is buried in the OccurenceID).

Describe alternatives you've considered I can go download ALL of the GBIF and iDigBio Id's and use the other ID bulkload tool to add them to my records, but that doesn't get people from iDigBio to Arctos and it will require constant rechecking and updating.

Additional context Again, working on more linking for our SPNHC presentation.

Priority I would like to have this resolved by date: 2019-04-01

dustymc commented 5 years ago

I don't understand the goal of this.

Genbank and Arctos complement each other - neither contain everything, you need both to get the full picture.

GBIF (and friends) contain nothing that's not in Arctos.

Perhaps they'd be willing to un-bury the OccurrenceDI link?

campmlc commented 5 years ago

Yes, but we would like to be able to find our records in idigbio and Gbif without having to go through their awful interface to see what the heck they are doing to our data. I second this request.

Also important in the grand scheme of things to have all these databases integrated

On Fri, Jan 18, 2019, 6:27 PM dustymc <notifications@github.com wrote:

I don't understand the goal of this.

Genbank and Arctos complement each other - neither contain everything, you need both to get the full picture.

GBIF (and friends) contain nothing that's not in Arctos.

Perhaps they'd be willing to un-bury the OccurrenceDI link?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-455736403, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hNTSCaJ8ijdSFOmw4FXQ2vycwYyXks5vEnRtgaJpZM4aIt6H .

Jegelewicz commented 5 years ago

GBIF (and friends) contain nothing that's not in Arctos.

False, they both make changes to our data and publish the altered information. These links would make it easier for us to find our stuff in both places and holler about issues such as Aves--->avus.

dustymc commented 5 years ago

They (GBIF anyway) have an API - if you want to write an interface to their data, you can, or you can submit Issues via https://github.com/gbif/portal-feedback/issues.

all these databases

There are a LOT more "DWC portals" than the two mentioned. What precisely is the scope of this?

make changes to our data and publish the altered information

A CC0 license explicitly allows that so I'm not sure I see the problem? Can you search GBIF for stuff they've "interpreted" or do they provide a report or something? If not, that might be a good issue.

find our stuff in both places and holler

Not sure I see this supporting that, except by accident - should work when someone clicks the link and somehow notices that something's changed, but that's it.

It's not clear to me that GBIF can support stable OccurrenceIDs (whatever they call the "1146020237" in https://www.gbif.org/occurrence/1146020237), because part of our "GBIF record identifier" isn't particularly stable (OccurrenceID contains specimen_event_id).

You can search (or link) GBIF with our stable ID eg, https://www.gbif.org/occurrence/search?organism_id=http://arctos.database.museum/guid/MSB:Mamm:292063

I suppose we could somehow use their API to find all the OccurrenceIDs associated with that "organism" and add them to Arctos, but that would be a lot of work and even more maintenance.

I'm still not quite understanding the use case. I think there's a lean towards "find things some portal has changed," but I don't see how the original request can do that.

campmlc commented 5 years ago

This is a request to automate linkages to GBif and iDigBio in the same way we link to GenBank. They already point to us. I don't see why the following would be difficult, but even if so it is something we should put resources into. The future is going to go to whoever can be the best integrator of specimen data. Just because these aggregators may not be doing the best job doesn't mean we shouldn't try. They have the resources and funding - we have decent data. We need to figure out a way to work together. Perhaps our effort will lead to discussions to improvement on their end.

"Use their API to find all the OccurrenceIDs associated with that "organism" and add them to Arctos" - yes

On Tue, Jan 22, 2019 at 11:11 AM dustymc notifications@github.com wrote:

They (GBIF anyway) have an API - if you want to write an interface to their data, you can, or you can submit Issues via https://github.com/gbif/portal-feedback/issues.

all these databases

There are a LOT more "DWC portals" than the two mentioned. What precisely is the scope of this?

make changes to our data and publish the altered information

A CC0 license explicitly allows that so I'm not sure I see the problem? Can you search GBIF for stuff they've "interpreted" or do they provide a report or something? If not, that might be a good issue.

find our stuff in both places and holler

Not sure I see this supporting that, except by accident - should work when someone clicks the link and somehow notices that something's changed, but that's it.

It's not clear to me that GBIF can support stable OccurrenceIDs (whatever they call the "1146020237" in https://www.gbif.org/occurrence/1146020237), because part of our "GBIF record identifier" isn't particularly stable (OccurrenceID contains specimen_event_id).

You can search (or link) GBIF with our stable ID eg, https://www.gbif.org/occurrence/search?organism_id=http://arctos.database.museum/guid/MSB:Mamm:292063

I suppose we could somehow use their API to find all the OccurrenceIDs associated with that "organism" and add them to Arctos, but that would be a lot of work and even more maintenance.

I'm still not quite understanding the use case. I think there's a lean towards "find things some portal has changed," but I don't see how the original request can do that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-456503546, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hMnziPjj0lnnQxeKejsyeRjU_bJdks5vF1Q9gaJpZM4aIt6H .

dustymc commented 5 years ago

this link already works from GBIF and iDigBio but it is buried in the OccurenceID They already point to us

Are these referring to the same thing?

Jegelewicz commented 5 years ago

I added both IDs to UTEP:Mamm:1

The lovely thing is that now I can EASILY go and see what both GBIF and iDigBio are presenting for my specimen and also see their data flags without the 5 or 6 steps it will take to open each site and perform a search.

If you follow the GBIF link, you can easily see the link back to my specimen right at the top of the GBIF occurrence record. I probably need to work on iDigBio somehow, because the link from them back to us is hidden waaaaay down the page next to occurrence ID.

In any case, I would really like to have these other ID links in Arctos just to make it easy for me to check up on the aggregators and for others to easily see that our data is also available through them. Think of it like social media for specimens - the more links, the more use. And as I said above, if we can automate this rather than me downloading and uploading and then checking periodically for updates, that would make life so much easier.

PLUS if we could demonstrate that we also drive traffic to GBIF and iDigBio, maybe we can get in on the funding.....wishful thinking, I know.

DerekSikes commented 5 years ago

I like this idea. It will definitely help us find and fix data issues that result from aggregators interpreting things differently.

-Derek

On Tue, Jan 22, 2019 at 12:12 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

I added both IDs to UTEP:Mamm:1 http://arctos.database.museum/guid/UTEP:Mamm:1

The lovely thing is that now I can EASILY go and see what both GBIF and iDigBio are presenting for my specimen and also see their data flags without the 5 or 6 steps it will take to open each site and perform a search.

If you follow the GBIF link, you can easily see the link back to my specimen right at the top of the GBIF occurrence record. I probably need to work on iDigBio somehow, because the link from them back to us is hidden waaaaay down the page next to occurrence ID.

In any case, I would really like to have these other ID links in Arctos just to make it easy for me to check up on the aggregators and for others to easily see that our data is also available through them. Think of it like social media for specimens - the more links, the more use. And as I said above, if we can automate this rather than me downloading and uploading and then checking periodically for updates, that would make life so much easier.

PLUS if we could demonstrate that we also drive traffic to GBIF and iDigBio, maybe we can get in on the funding.....wishful thinking, I know.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-456565285, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM4scU_ato02SCzkEn4DD7_XUCopmks5vF37SgaJpZM4aIt6H .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 5 years ago

There are three obvious ways to do about what you've done.

1) Do what you did. Update it every time something changes at any of the other resources. 2) Do and maintain what you did with automation. 3) Make links using the ID that we know about and which are stable, eg, https://www.gbif.org/occurrence/search?organism_id=http://arctos.database.museum/guid/MSB:Mamm:292063

I don't think (1) is even sort of practical.

(3) is trivial, but ends up at a list of Occurrences. I'm not sure I see the problem with that - the options are to click 5 links from the Arctos end vs. click 5 links from the GBIF end.

(2) is possible and probably "best" but likely requires significant resources, both maintenance and machines. It also provides the possibility of eg, comparing raw and "interpreted" data.

campmlc commented 5 years ago

Can you give us an idea what sort of resources? I'm not sure I understand how this is similar/different from GenBank. Do we need to approach iDigBio and GBif and ask for what resources might be available from their end?

On Tue, Jan 22, 2019 at 2:42 PM dustymc notifications@github.com wrote:

There are three obvious ways to do about what you've done.

  1. Do what you did. Update it every time something changes at any of the other resources.
  2. Do and maintain what you did with automation.
  3. Make links using the ID that we know about and which are stable, eg, https://www.gbif.org/occurrence/search?organism_id=http://arctos.database.museum/guid/MSB:Mamm:292063

I don't think (1) is even sort of practical.

(3) is trivial, but ends up at a list of Occurrences. I'm not sure I see the problem with that - the options are to click 5 links from the Arctos end vs. click 5 links from the GBIF end.

(2) is possible and probably "best" but likely requires significant resources, both maintenance and machines. It also provides the possibility of eg, comparing raw and "interpreted" data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-456574411, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hFD9fEZB_ejh3DCe8UUUycch7_3Cks5vF4WogaJpZM4aIt6H .

jrdemboski commented 5 years ago

I like the idea as well as it provides more (direct) ways to link to other places where our specimen records are published (Teresa's social media analogy) and to more easily find "problems" on the aggregator side

John


From: DerekSikes notifications@github.com Sent: Tuesday, January 22, 2019 2:18 PM To: ArctosDB/arctos Cc: Subscribed Subject: Re: [ArctosDB/arctos] GBIF and iDigBio - reciprocal links? (#1883)

I like this idea. It will definitely help us find and fix data issues that result from aggregators interpreting things differently.

-Derek

On Tue, Jan 22, 2019 at 12:12 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

I added both IDs to UTEP:Mamm:1 http://arctos.database.museum/guid/UTEP:Mamm:1

The lovely thing is that now I can EASILY go and see what both GBIF and iDigBio are presenting for my specimen and also see their data flags without the 5 or 6 steps it will take to open each site and perform a search.

If you follow the GBIF link, you can easily see the link back to my specimen right at the top of the GBIF occurrence record. I probably need to work on iDigBio somehow, because the link from them back to us is hidden waaaaay down the page next to occurrence ID.

In any case, I would really like to have these other ID links in Arctos just to make it easy for me to check up on the aggregators and for others to easily see that our data is also available through them. Think of it like social media for specimens - the more links, the more use. And as I said above, if we can automate this rather than me downloading and uploading and then checking periodically for updates, that would make life so much easier.

PLUS if we could demonstrate that we also drive traffic to GBIF and iDigBio, maybe we can get in on the funding.....wishful thinking, I know.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-456565285, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM4scU_ato02SCzkEn4DD7_XUCopmks5vF37SgaJpZM4aIt6H .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/1883#issuecomment-456567061, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFeQaso8e0zHlktkbAbTS5eD5GNjKj7tks5vF4AhgaJpZM4aIt6H.

dustymc commented 5 years ago

For (3) just say "go" and I think I can make it happen.

For (2)....

We have 20K records linked to genbank, "shared" data doesn't change, links are created by operators, and they have tools designed to do what we do.

We have ~3m records "linked" to each of the big DWC portals (and a bunch of smaller ones, and various bits and pieces in various other places). They change stuff constantly, make up identifiers, we make up identifiers for them, nothing is reviewed by anyone, and AFAIK nobody's ever tried what we're talking about (at least not in a non-static environment).

Details would depend on what ya'll want. If it's just links, then I'm not sure why (3) won't work. If it's just links but you really want to link to individual Occurrences for some reason then perhaps we could do that on the fly when someone opens specimendetail. If that won't work then it's probably better to chat and work out the scope of this.

Jegelewicz commented 5 years ago

Just links, but to the individual GBIF identifiers. So for the example you give, Arctos should include 5 GBIF identifiers.

https://www.gbif.org/occurrence/1300283486 https://www.gbif.org/occurrence/1300283509 https://www.gbif.org/occurrence/1300283495 https://www.gbif.org/occurrence/1585924852 https://www.gbif.org/occurrence/1300283492

I added them manually to MSB:Mamm:292063 as a demo.

Can we automate adding the GBIF IDs or no? It is really easy to add them from a GBIF download, but it sure would be nice if we could get them added as they show up in GBIF.

I just added all 8K UTEP Mammal GBIF IDs and it took me about 15 minutes once I got the download from GBIF.

dustymc commented 5 years ago

There are some new links on specimendetail. It's quick, dirty, there's no logging, no cache, no filters, nothing to maintain, etc. The GBIF links should be trustworthy. iDigBio's API doesn't support query by the identifiers we provide so those are basically guesses.

Now can we please de-bulkload this stuff from otherIDs, delete those identifier types, and pinky-promise to never again mix data and random things we find lying around in such a manner?

campmlc commented 5 years ago

Bueno - I like this. The records that have relationships also pull in the GBIF and iDigBio links to their related items, e.g. host and parasites.

Is there a de-bulkloader? I sure could use that.

On Thu, Jan 24, 2019 at 9:42 AM dustymc notifications@github.com wrote:

There are some new links on specimendetail. It's quick, dirty, there's no logging, no cache, no filters, nothing to maintain, etc. The GBIF links should be trustworthy. iDigBio's API doesn't support query by the identifiers we provide so those are basically guesses.

Now can we please de-bulkload this stuff from otherIDs, delete those identifier types, and pinky-promise to never again mix data and random things we find lying around in such a manner?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-457266716, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hJP1FNuFK-r2maTLHvb079W6lhK4ks5vGeJYgaJpZM4aIt6H .

dustymc commented 5 years ago
screen shot 2019-01-24 at 8 56 12 am
campmlc commented 5 years ago

Cool. Is that the only bulk delete tool we have?

On Thu, Jan 24, 2019 at 9:57 AM dustymc notifications@github.com wrote:

[image: screen shot 2019-01-24 at 8 56 12 am] https://user-images.githubusercontent.com/5720791/51694556-f1560980-1fb5-11e9-903d-db77db3980ec.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-457272843, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hCtsiYD7oizgRvHaIHA33Ixn0AZuks5vGeXcgaJpZM4aIt6H .

Jegelewicz commented 5 years ago

Can you just auto-magic the ones I added?

dustymc commented 5 years ago

@campmlc I believe so - Issue if you need one somewhere else.

@Jegelewicz hu?

Jegelewicz commented 5 years ago

never mind, I used the tool to remove them.

dustymc commented 5 years ago

These still exist.


UAM@ARCTOS> select guid from flat where collection_object_id in (select collection_object_id from coll_obj_other_id_num where other_id_type='GBIF occurrence ID');

GUID
------------------------------------------------------------------------------------------------------------------------
MSB:Mamm:292063
MSB:Mamm:299226
UTEP:Herb:22507
UTEP:Herp:14082
UAMObs:Ento:237930
UAMObs:Ento:237931

6 rows selected.

Elapsed: 00:00:00.06
UAM@ARCTOS> select guid from flat where collection_object_id in (select collection_object_id from coll_obj_other_id_num where other_id_type='iDigBio');

GUID
------------------------------------------------------------------------------------------------------------------------
DMNS:Mamm:9734
MSB:Mamm:299226
UTEP:Herb:22507
UTEP:Herp:14082
jrdemboski commented 5 years ago

Dusty

Removed DMNS:Mamm:9734 iDigBio link

Thanks for getting all those GBIF and iDigBio links associated with Arctos records!

-John


From: dustymc notifications@github.com Sent: Monday, January 28, 2019 4:07:53 PM To: ArctosDB/arctos Cc: John Demboski; Comment Subject: Re: [ArctosDB/arctos] GBIF and iDigBio - reciprocal links? (#1883)

These still exist.

UAM@ARCTOS> select guid from flat where collection_object_id in (select collection_object_id from coll_obj_other_id_num where other_id_type='GBIF occurrence ID');

GUID

MSB:Mamm:292063 MSB:Mamm:299226 UTEP:Herb:22507 UTEP:Herp:14082 UAMObs:Ento:237930 UAMObs:Ento:237931

6 rows selected.

Elapsed: 00:00:00.06 UAM@ARCTOS> select guid from flat where collection_object_id in (select collection_object_id from coll_obj_other_id_num where other_id_type='iDigBio');

GUID

DMNS:Mamm:9734 MSB:Mamm:299226 UTEP:Herb:22507 UTEP:Herp:14082

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/1883#issuecomment-458338977, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFeQat3zFvHBLrqEwl0V-mQow3cLci7pks5vH4LJgaJpZM4aIt6H.

Jegelewicz commented 5 years ago

I got the MSB and UTEP stuff removed.

dustymc commented 5 years ago

These are left

GUID
------------------------------------------------------------------------------------------------------------------------
UAMObs:Ento:237930
UAMObs:Ento:237931

@DerekSikes what's going on with those??

DerekSikes commented 5 years ago

I examined them and they should stay.

These GBIF links are not to the reciprocal of the UAM Arctos record - the Arctos records are duplicates of USNM records that are in GBIF. These were made by Matt Bowser who was researching earthworms in Alaska so used Arctos to pull all the known records together. Perhaps he should encumber these since they're duplicates of USNM records.

-Derek

On Tue, Jan 29, 2019 at 6:32 AM dustymc notifications@github.com wrote:

These are left

GUID

UAMObs:Ento:237930 UAMObs:Ento:237931

@DerekSikes https://github.com/DerekSikes what's going on with those??

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1883#issuecomment-458584545, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM3AnDfxk632xL0WFn2s9IIM3BGPeks5vIGmegaJpZM4aIt6H .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 5 years ago

Is there some reason they aren't linking directly to USNM instead of through GBIF??

http://n2t.net/ark:/65665/32cc05c93-0709-4cb2-a367-a6aa522246a3 http://n2t.net/ark:/65665/335794284-f3b0-44ba-bfd0-1cc4e5044f25

dustymc commented 5 years ago

Since we're experimenting in production...

I added an ARK OtherID type and an ID to https://arctos.database.museum/guid/UAMObs:Ento:237930

That means the ID will work for anyone using ARKs, but it also means the USNM number isn't included anywhere.

The USNM number isn't resolvable - as far as I can tell, nothing can do anything with USNM 123906 - so maybe it's best to put that in a separate ID (without a base_url).

That's sort of weird and disassociated, but I don't have better ideas.

None of that necessarily has to displace the GBIF number - we can keep both, get rid of the ARK experiment, whatever.

Jegelewicz commented 5 years ago

And last, but not least.

When new records are added to Arctos and submitted to GBIF and iDigBio, will the new fancy links be auto-generated?

dustymc commented 5 years ago

auto-generated

yes, it's all dynamic