Closed acdoll closed 2 years ago
Thoughts
WHY is this not a simple webservice?! We're making EMLs, why's GrSciColl apparently not using them? We have persistent resolvable collection identifiers (sorta...), I'd be happy to pass those on and they should be sufficient to stitch everything together.
And what precisely do they DO for us; why would we put any effort into this, what's the return? (A major original goal was unambiguous links with GenBank, but that never became functional so GenBank made their own registry - if any serious discussions come of this, it would be immensely useful to rectify that.)
I don't have any objection to adding some more collection IDs and pushing them to the EMLs, and it's easy enough to do so, but expecting collections to maintain three (at least and probably counting...) completely independent registries is not reasonable, and sharing data via IPT is apparently a lot more complex and manual than seems necessary. Is there any way we can leverage this to make things better for everyone, rather than adding some identifiers just because they exist?
@tucotuco was at the meeting that spawned what's now GrSciColl, and @dbloom is (painfully, apparently!) doing whatever it takes to push DWC to the world - please feel free to jump in here.
See also https://groups.google.com/g/gbif-na/c/7DJWqkMNhYQ/m/VxeZslWUAAAJ
We're making EMLs, why's GrSciColl apparently not using them? We have persistent resolvable collection identifiers (sorta...),
I think @dbloom might be able to chime in here too.
We have persistent resolvable collection identifiers (sorta...),
That's what I thought, but apparently what they're getting now doesn't match up to their GrSciColl list. Clearly their lists are not great, check out DMNS; lots of repeats of 'DMNS' where each collection code should be unique (should they all be like 'DMNS:Mamm', or just 'Mamm', 'Bird', ...etc?). I could just get them to fix that for DMNS, but it would be great to get this straight for all collections. I would be happy to work with Marie to find a solution based on the the data they are getting now, from EMLs or in the IPT dataset (or other?). It looks like they may have gotten 'DMNS:Mamm' from the "collectionIdentifier" field in the EML file, but the other collections have similar values (e.g., 'DMNS:Bird') that don't appear to have made their list.
See #3955
doesn't match up to their GrSciColl list.
http://arctos.database.museum/guid/DMNS:Bird is (sorta...) your "collection identifier." I'm not sending that in - I thought they were demanding "DWC Triplets" (which can't be unique so messes should be expected). I'd not be at all surprised to find that I'm putting the wrong data in LOTS of wrong places, especially in the EML which has no documentation that I can find so is reverse-engineered. Explicit instructions very much appreciated.....
We probably need to watch this and review the manual.....
Maybe we need non-"sorta" collection IDs: http://test.arctos.database.museum/collection/dmns:inv will work in next release.
I also added that to the detail page (which is just "details" from home.cfm).
I could redirect the guid version I've been tossing around above as well, but I think it's "cleaner" if /guid/ does one thing and other things use some other /urlbit/.
detail page (which is just "details" from home.cfm
I'm beginning to feel like that page needs some love and also that you should be able to get to it from any record in the collection rather than it being kinda hidden the way it is now.
So I watched the video and
With regard to 1
there appears to be zero relationship.
DMNS Institution at GBIF - https://www.gbif.org/publisher/a2ef6dd1-8886-48c9-8025-c62bac973cc7 DMNS at GRSciColl - https://www.gbif.org/grscicoll/institution/1757f021-01b8-4d20-a11a-1da09db2d8b2
WHY?!
or does it
Not that I can tell.
AND the need to complete all the stuff
I'm more or less aware of how it works, but I'm also aware of what was intended and is possible.
stuff directly at the IPT
We demonstrably have trouble getting folks to update the thing they use every day; I still don't see adding to that as practical (and don't forget GenBank). And see below.
we can't re-generate the eml
What?! It's built on demand, the problem isn't making EML, it's that nobody seems able to DO STUFF with the EML - it's apparently just for show. That should change! That's my entire point! There are tools, The Community just isn't using them! I have no idea if that's just ignorance (eg I'm not building the EML correctly), or if there's some sort of development needed (I don't think so???), or ???????????????????
zero relationship...WHY
People typing not-quite-the-same-thing into a whole bunch of forms!
The other usual answer is "cruddy identifiers," and it looks like we may be contributing to that; I can't see any relationship between the EML we're generating and the DWC data we're pushing. Unless someone stops me now-ish, I'm going to change collectionCode
in the DWC and collectionIdentifier
in the EML to use/share {baseurl}/collection/{guid_prefix}
.
we can't re-generate the eml
What?! It's built on demand, the problem isn't making EML, it's that nobody seems able to DO STUFF with the EML - it's apparently just for show. That should change! That's my entire point! There are tools, The Community just isn't using them! I have no idea if that's just ignorance (eg I'm not building the EML correctly), or if there's some sort of development needed (I don't think so???), or ???????????????????
Apparently, @dbloom makes significant changes to what we generate, that he then has to repeat if we generate a new eml.
But yes, I only use the EML once. I could download an updates EML from Arctos and upload it into an existing resource, but I would lose two thirds of all of the metadata. The stuff I get from Arctos is a good start, but it's only a start. For example, you provide me with a contact name, email, etc..., but much of that stuff doesn't go into the correct fields in the IPT, so I have to move stuff around manually. Then I need to make sure that same info is in three, possibly four, locations throughout the metadata, plus I usually have to add specific information (web pages, phone numbers, etc). Furthermore, if I replace the existing eml with an updated Arctos eml I would have to redo all of the metadata that I don't get from Arctos, such as the mappings to the GBIF publisher, the CC designation, formatting of the map, the taxonomic and temporal scopes, and a whole host of other things. So, yeah, I only use the Arctos EML the first time I create a resource. If that resource was published prior to going into Arctos I don't use the Arctos EML at all.
David Bloom
Tell me what it should look like and I'll make that happen....
I don't know @dbloom will have to tell us.
Also, doesn't @mkoo have permission to edit at GBIF?
@Jegelewicz What do you mean "have permission to edit at GBIF?" There are many points of entry through which one could edit GBIF related materials.
As for the EML, here is a sample of metadata that is completed and published: http://ipt.vertnet.org:8080/ipt/eml.do?r=uwymv_bird&v=30.59 (if it doesn't open in the browser as XML you should be able to "view page source" to see it properly, or I can send you a document separately). When @dustymc and I discussed this the last time we recognized that there is bunch of stuff in there that doesn't necessarily have a correlate in Arctos, not all collections will have the same metadata fields/content, some of this content is generated by the IPT, and some other fields I will probably need to update manually regardless of what we do, but here it is. Happy to discuss more as needed.
Of course, this has nothing to do with the institutional/publisher metadata in the GBIF Registry. I think they idea to get the Registry and IPT files to work together, but right now, they are separate sets of metadata.
permission to edit at GBIF
That's no solution, whatever might be intended....
@dbloom I'm not seeing what's functionally different between what I generate (http://test.arctos.database.museum/info/ipt.cfm?guid_prefix=UWYMV%3ABird) and your example. If there's something missing it should be added to Arctos where it can be shared, or added to the generator if it's there, and I think we're all onboard with that (right!?).
I'll go add the orcid, otherwise can you tell me more about what's problematic?
nothing to do with the institutional/publisher metadata
That's the core of what I'm asking to fix, I think (but I can't say I really understand how this all fits together so ???). We've got a fair bit of time in generating EMLs, finding out they don't do anything useful isn't what I had in mind!
If there's something missing it should be added to Arctos where it can be shared, or added to the generator if it's there, and I think we're all onboard with that (right!?).
I am because
We've got a fair bit of time in generating EMLs, finding out they don't do anything useful isn't what I had in mind!
However, If the idea is that we get everything in GRSciColl correct and that will be the single source of truth, then let's shoot for that. I think the problem right now is there is no direction and we are left completing information in at least three different places. Maybe we should get GBIF in on this conversation? but who?
GRSciColl correct and that will be the single source of truth,
So to add an address to Arctos, you'd go to GRSciColl, edit stuff there, then - what? I can't pull any more than I can push....
left completing information in at least three different places
If I could pull from GRSciColl when maybe they could just somehow act as part of the agent UI for Arctos, but I don't think that kind of use is on anyone's radar. I definitely agree that we should be doing this one place, but I don't think that's GRSciColl.
EML generator is now picking up orcid. Example:
<creator>
<individualName>
<givenName>Elizabeth</givenName>
<surName>Wommack</surName>
</individualName>
<organizationName>University of Wyoming Museum of Vertebrates</organizationName>
<positionName>Staff Curator</positionName>
<address>
<deliveryPoint>Berry Biodiversity Conservation Center, 1000 E. University Ave.</deliveryPoint>
<city>Laramie</city>
<administrativeArea>WY</administrativeArea>
<postalCode>82071</postalCode>
<country>USA</country>
</address>
<electronicMailAddress>ewommack@uwyo.edu</electronicMailAddress>
<electronicMailAddress>ravenseyes@gmail.com</electronicMailAddress>
<userId directory="http://orcid.org/">https://orcid.org/0000-0002-9172-0120</userId>
</creator>
Did they change something at the GBIF registry? I have only been editing/adding new collections but maybe need to review all the arctos ones... also can you see a Suggest link on the site if you log on? I think anyone can do that. Should we get in touch with GBIF regarding these fuzzy matches?
On Thu, Sep 23, 2021 at 1:27 PM dustymc @.***> wrote:
GRSciColl correct and that will be the single source of truth,
So to add an address to Arctos, you'd go to GRSciColl, edit stuff there, then - what? I can't pull any more than I can push....
left completing information in at least three different places
If I could pull from GRSciColl when maybe they could just somehow act as part of the agent UI for Arctos, but I don't think that kind of use is on anyone's radar. I definitely agree that we should be doing this one place, but I don't think that's GRSciColl.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3953#issuecomment-926129658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UN7CSQA7BT2TRANAB3UDOETXANCNFSM5ESDIJTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Did they change something at the GBIF registry?
Probably as they ingested stuff from iDigBio and Index Herbariorum and did some smashing together with the stuff they already had.
"sorta" collection IDs: http://test.arctos.database.museum/collection/dmns:inv
Better to use the number? CollectionID: 74
Less readable by humans, but also no reason to change it - ever?
I have no opinion, just thought I would bring it up.
" non-"sorta" "!!
I think those are good identifiers.
Guid_Prefix should be seen as our most sacred possession; there's nothing more stable.
Collection_id is just a key; like all keys, "slightly easier than the alternative" is sufficient reason to change it.
@dustymc how do I find out what the id number is for a collection?
Hu?
Asking for https://github.com/ArctosDB/new-collections/issues/404#issuecomment-915510941 - need to do this for UTEP:Herb
Test or prod? If prod, what's broken? If test, why? Does that have something to do with this issue??
You can get collection_id from collection, but that approach is going to melt something "interesting" if there are very many records involved.
test - nothing to do with this except that it is the ID I need. Trying to get records entered by UA herbarium tester to show up so they can see what they did.
I just did this for one of the UTEP collections - https://registry.gbif.org/collection/d3957974-8fb6-49b2-8983-37b4b5824381?suggestionId=38
But who has time for that times 215?
But who has time for that times 215?
That's my whole point here!! (And I thought you were arguing that we just have to find the time?!?)
And FWIW "UTEP:Herb" has about zero chance of being unique and doesn't align with what's in the DWC data nor what the EML generator will suggest - it's just not a useful identifier, suggest using the value from https://arctos.database.museum/collection/UTEP:Herb (which happens to be https://arctos.database.museum/collection/UTEP:Herb
)
@Jegelewicz UTEP:Herb at test seems to have updated - I told ~40K other records that they were current, which was difficult - that's just not an environment which can support the background tasks. You can update singles with eg select update_flat_row (collection_object_id) from flat where guid='UTEP:Herp:123'
And FWIW "UTEP:Herb" has about zero chance of being unique and doesn't align with what's in the DWC data nor what the EML generator will suggest - it's just not a useful identifier, suggest using the value from https://arctos.database.museum/collection/UTEP:Herb (which happens to be https://arctos.database.museum/collection/UTEP:Herb)
Yeah - but I think we need to agree on that across all collections and be consistent. Kinda waiting to see what falls out here.
agree on that across all collections
Sent to scientific-collections@gbif.org
I am the project coordinator for Arctos and I would like to discuss how we might directly populate entries in GRSciColl for all of the collections in Arctos. We already hold the information included in GRSciColl in our system and we would prefer that our users have the ability to maintain their information in their collection management system and not need to duplicate effort by copying it to GRSciColl.
For example, the University of Texas at El Paso Biodiversity Collections Herbarium:
Arctos Page GRSciColl Page
There is really no reason these two pages should contain significantly different information and we would like to see if we can make the process of keeping them in sync easier for Arctos collection managers.
I'd be happy to meet and discuss possibilities.
Thank you,
Teresa J. Mayfield-Meyer
agree on that across all collections
Nobody stopped me....
I think that we need to revisit this. Given the definitions, I think we should do this:
NMMNH:Paleo as example
Institution Code - NMMNHS Institution ID - https://www.gbif.org/grscicoll/institution/bcc1478b-1409-43c3-a013-69586aa98753 Collection Code - NMMNH:Paleo Collection ID - https://arctos.database.museum/collection/NMMNH:Paleo
Collection ID
That would break Dave's scripts, and we're not paying him enough for that.
We're providing a good identifier now, I don't see any point in arbitrarily shuffling more things around. If someone wants to talk to us or throw up an API or something - well, we're easy to find....
That would break Dave's scripts, and we're not paying him enough for that.
We aren't paying him enough anyway, but that's not an excuse for putting data in the wrong bucket. As it is, our records will still not get matched to a collection. We really need a discussion with GBIF, @dbloom and some Arctos people to decide what should go where because I feel that we are not putting our best foot forward.
For instance:
RecordedBY - A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
Currently not only do we put collector's name, we also put preparators and we are adding in stuff that is not expected
Collector(s): Paul L. Sealey
and we are missing an opportunity by not passing RecordedByID - A list (concatenated and separated) of the globally unique identifier for the person, people, groups, or organizations responsible for recording the original Occurrence.
where we could pass
but even better
GBIF has some issue flags related to our institution and collections codes not matching directly to identifiers in GrSciColl. E.g.: Speaking with Marie Grosjean with GBIF, she suggested including 'institutionid' and 'collectionid' identifiers in our data exports which will ensure that GBIF records are appropriately linked to the correct institution and collection. She pointed me to this FAQ for determining what values to use for these fields. The values for these identifiers can be found on the institution/collection pages on GrSciColl: Conversely, she said I could work with her to ensure DMNS collections are properly identified on their end, but it seems like something we could do easily enough for all Arctos collections - could be part of the initial portal setup for new collections? Thoughts?