ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

MorphoSource links that work like GenBank? #1882

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 5 years ago

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe. There are Arctos specimens in MorphoSource without reciprocal documentation in Arctos.

There are also Arctos specimens in MorphoSource without reciprocal links in Arctos that have other problems, such as this record that is identified as Iguana iguana in MorphoSource, but the cataloged item in Arctos is Uta stansburiana.

Describe the solution you'd like Have we made the attempt to work with MorphoSource in the same way we do with GenBank? So that properly cited Arctos specimens in MorphoSource will generate a link to the Arctos record? Also, can we get a similar "low quality data" report related to MorphoSource as we do for GenBank to help collections know that they might have images in MorphoSource?

Describe alternatives you've considered Manually search MorphoSource regularly and ensure links are correctly made.

Additional context Looking for linking opportunities before our planned SPNHC presentation about connected data.

Priority I would like to have this resolved by date: 2019-03-15

dustymc commented 5 years ago

The magic at GenBank involves them maintaining a list of repositories. MorphoSource doesn't seem to have that - a query for UAM (https://www.morphosource.org/api/v1/find/specimens?q=specimen.institution_code:UAM) find https://www.morphosource.org/Detail/SpecimenDetail/Show/specimen_id/6523 because University of Arkansas Museum. We can probably do some stuff, but I think going very far with it is going to involve them rethinking identifiers.

I'll play with the API some more when I can.

Jegelewicz commented 5 years ago

If we need to talk with them, let me know - I'd be happy to help facilitate!

dustymc commented 5 years ago

Yes, I think working with them to establish some sort of controlled way for submissions to indicate SOMETHING that unequivocally leads to specimens would be useful.

Alternatively and/or in addition to something from morphosource, that could just be something we document (eg, suggest Curators add to loan agreements). "When submitting to morphosource, thou shalt...."

Wild guess: https://www.morphosource.org/Detail/SpecimenDetail/Show/specimen_id/3728 points to http://arctos.database.museum/guid/UMNH:Herp:8084 and the problem involves the specimen identifier (wrong UMNH, 8084 is some number that's scribbled on the correct specimen but not entered into Arctos, etc.) rather than the identification. Who knows what's actually going on, but that's what about I'd expect when working with uncontrolled stings, which seems to be the case.

I can find a bunch of stuff through OccurrenceID (https://www.morphosource.org/api/v1/find/specimens?q=http%3A%2F%2Farctos.database.museum%2F*), but that's not stable (the Utah specimen probably had one not involving Arctos at some point), doesn't seem to be very consistently used, etc.

dustymc commented 5 years ago

@mkoo will talk to MS about registering collections (like we do with Genbank) or somehow otherwise providing a means by which we can find "our" collections. They seem to have the idea, but it has problems.

https://www.morphosource.org/Browse/Index click institution and you'll see things like....

screen shot 2019-02-12 at 11 17 04 am

I'm not sure what's going on there, but it doesn't look like something that could support query by institution.

https://www.morphosource.org/api/v1/find/specimens?q=specimen.institution_code:UAM contains...

screen shot 2019-02-12 at 11 18 08 am screen shot 2019-02-12 at 11 18 48 am

etc - I think MS may be using Institution to refer to the project or something - it's certainly not the institution that owns the specimen (or maybe their data are just jumbled).

If they can provide a reliable way to find "our" specimens then I can use it to compile a list of OtherIDs that just need confirmed and loaded, or perhaps automagically create those if some trustworthy system can be worked out.

Once we have MS Other IDs I should be able to use their API to create Media. That's somewhat redundant, but it will make the Media more visible from Arctos so seems worthwhile.

dustymc commented 4 years ago

@juliawinchester

dustymc commented 4 years ago

Workshop comment: need to extend data-sharing to broader community.

@JuliaWinchester would morphosource take 3D data about "cultural" items?

@AJLinn @marecaguthrie @sjshirar do ya'll have/anticipate having 3D/CT/etc. data?

dustymc commented 4 years ago

Found a link to specimen_id/{MEDIA_ID}

  1. Update our documentation
  2. Use API to notify collection if the link looks like another specimen or 404s
AJLinn commented 4 years ago

@AJLinn @marecaguthrie @sjshirar do ya'll have/anticipate having 3D/CT/etc. data?

We do have 3D models of some UAM:EH and UAM:Arc collections already we could certainly associate with current objects, but I think we just have not so far. We'd have to make sure we don't need agreements with any tribes in order to make those scans available online.

JulieWinchester commented 4 years ago

Hi all! I apologize for dropping the ball on not responding to this for so long. This is a conversation we are definitely interested in having, and we would definitely like to do more to relate MorphoSource specimen records to Arctos specimen records. I'm going to try to address a number of things mentioned in this thread so far, so this may get a bit long.

First, a bit of context: @dustymc knows this, but for everyone else we are currently working on rebuilding the repository application MorphoSource uses from the ground up (as "MS2"). We know the current site has some issues, and there can be some inconveniences related to that. We are hoping to fix these issues and provide much better tools for data integration in MS2.

Relating specimens to institutions or collections is a persistent issue for us. Especially since our data is at its core user-contributed. This can produce some duplication in the data, which we try to fix and also are preventing as much as we can in MS2. The South Australian Museum situation @dustymc mentioned above was a bug and has been fixed since then.

But those API-queried specimens, the UAM fish with very strange linked institutions, of course have wrong institutions linked to them. And I'm pretty sure I know how it happened. In order to try and reduce the inevitable errors from people manually entering data, we encourage data contributors to supply occurrence IDs, and if those specimens are aggregated on iDigBio, then we automatically import as much iDigBio aggregated collection-supplied metadata as possible. (Incidentally, something we are definitely interested in is connecting directly to collections databases like Arctos to do this without the aggregator middle step.) But iDigBio lacks controlled lists of institutions, so that is unfortunately one bit of data we are not able to import. Therefore, we rely on users to select the appropriate institution for media. I suspect whoever contributed this data did not know what the institution was and got it wrong. We will work on trying to fix these records.

All that being said, we would love to do better. If someone could point us to more info about how Genbank registers collections to help address some of these issues, that could be great. I'll also say in MS2 we are going to allow for much better linking of specimens to collections, departments, labs, and other groups we call "organizations" which typically sit below the level of institution, so maybe that goes along with some of this. But so far we have still been faced with the question of asking users to manually indicate which organization is associated with a specimen. I can think of some solutions which would work for some subset of the data, like specifying institution and collection level data for iDigBio recordsets. I'm sure there are other options out there. It would be great if we could collaborate to develop solutions here that allow solid linkages between Arctos and MorphoSource.

Also to answer the other question asked, MS2 will support cultural heritage items! :) And we support the archiving of media data with totally private or private but shared with selected individuals access settings. We have some almost completely dark archives of bioarchaeological data archived in that way.

dustymc commented 4 years ago

Relating specimens to institutions or collections is a persistent issue for us everyone!

Arctos publishes resolvable GUIDs (with some other junk to make them unique as Occurrences) as OccurrenceIDs so I think that works for us, even if it's not ideal.

What become https://www.gbif.org/grscicoll was intended to be a global registry but its always had fatal assumptions. GenBank was forced to build their own (http://www.insdc.org/controlled-vocabulary-specimenvoucher-qualifier) but I doubt it's useful for more than GB. Maybe someday we'll have a registry, but I'm not holding my breath.

I'm definitely willing to provide APIs for whatever makes things cooler.

Very good to hear regarding "cultural" stuff! I don't think those records will ever be accepted by iDigBio, which might be a reason to develop some sort of "if arctos then api else use idigbio" logic. Which probably requires some sort of local registry....

mkoo commented 4 years ago

What is the update on harvesting Morphosource API for hosted specimens links?

ebraker commented 3 years ago

I would love to revisit this - we are in the process of loading 100 CT scans to MorphoSource 2.0!

Jegelewicz commented 3 years ago

@ebraker to review and write up best practice.

Jegelewicz commented 3 years ago

See also #3847

ebraker commented 3 years ago

@JulieWinchester We were just (re)discussing the possibility of generating reciprocal links between Arctos and MorphoSource records. Please let us know if you are in a place to start thinking about this given where things are at with the MS2 rollout.

Related to this are MS identifiers. It would be neat to point to the MS Biological Specimen page in corresponding Arctos records so that users are directed to the 'total view' object landing page where they can view ALL linked MS media, projects, etc. representing a given object. We can do that now with URI https://www.morphosource.org/biological_specimens/, but were curious if minting Specimen ARKs is in the works. Linking to a stable ARK identifier (as is currently possible with media) would be best case scenario over a potentially less-predictable-over-time URI. We would still use media ARKs, but treat them as Media links, which are stored in a separate table from Identifiers.

Jegelewicz commented 2 years ago

@JulieWinchester you might also consider linking up with GRSciColl - https://www.gbif.org/grscicoll

dustymc commented 2 years ago

nothing can be done here