AtlasOfLivingAustralia / avh-hub

Australian Virtual Herbarium
https://avh.ala.org.au
Mozilla Public License 2.0
4 stars 2 forks source link

Collection code appears twice on record detail page #101

Closed nielsklazenga closed 4 years ago

nielsklazenga commented 4 years ago

Another boring bug fix. This one doesn't bother me so much.

Collection code appears twice on the record detail page, once correctly linked to the Collectory entry and another time with no link.

image

elywallis commented 4 years ago

@nielsklazenga I'm not sure if it's accurately described as a bug but more an artefact of the Provider Map mapping. Looking at Museums Victoria data, the lower instance of Collection code is what MV actually provides - invertebrates, entomology, herpetology etc. The upper instance of Collection code is the value added by ALA to the record - e.g. "Museums Victoria Marine Invertebrates Collection". Having the two values has already caused problems when trying to link records for the same specimen held in different institutions (herbarium duplicates are one example, specimen and tissue is another) because the institution trying to provide data to do the linking doesn't know which of the "Collection Codes" to use

nielsklazenga commented 4 years ago

I think this is just a display issue on the record detail page. As you can see in the screenshot, for institutionCode the value provided is given as 'Supplied institution code "MEL"' in the same row, while for the collectionCode the provided and processed values are in different rows. The processed values are of course not really institution and collection codes anymore, but institutions and collections.

I agree this is probably not really a bug, more an inconsistency.

nickdos commented 4 years ago

How about we rename the first one as "Collection" - code part is misleading because we don't actually show the code there.

nielsklazenga commented 4 years ago

Good idea. Any chance the second 'Collection code' can be moved upwards in the table, so it sits directly below Collection?

peggynewman commented 4 years ago

Just to be difficult, it's kind of like a raw and a processed value, with the code being the raw value and the processed value is the name that biocache-store finds via a lookup to the ProviderMap. I wonder if we should try to stick to Darwin Core terms.

nielsklazenga commented 4 years ago

Ideally 'Collection code' and 'Institution code' would be handled the same, but Nick can't do that, as that is a Biocache Store thing (I think; and will hopefully be fixed when we get the new pipeline). Moreover, collectionCode and Collection are two different things. This is not really raw and processed, but raw and inferred, ...more like the relationship between latitude and longitude and the environmental and contextual layers.

nickdos commented 4 years ago

I agree with @peggynewman that we should stop making stuff up and use DwC as a preference.

EDIT: DwC uses collectionCode. There is no other field for collection stuff.

For most institution code we show both processed and raw versions (Supplied as...) in the same cell, so thinking we should do the same for collection code...

image

The point about having a raw and processed version of the same DwC field keeps popping up and my preference is to deal with this by having the raw version point to the official DwC (in a linked data sense) - e.g. http://rs.tdwg.org/dwc/terms/collectionCode and then when we refer to the processed version, we host a version of the same field name with a different namespace and URI, e.g. http://ala.org.au/ala/terms/collectionCode (ala:collectionCode) that resolves to a documentaion page that explains that it is the processed version and how the processed version is computed, etc.

elywallis commented 4 years ago

My two cents is that, in these cases, I'd like to see a clearer delineation of what the Darwin Core value is and what we've processed it to - usually to make it more human readable I assume? In the example above, the institution has correctly supplied the Darwin Core element institutionCode as ERBG. We have processed that to a field that is basically "Institution Name". As Nick says there is no Darwin Core element for "Insitution Name". There's also InstitutionID in Darwin Core (though I don't know how many organisations provide it) that should give a URI to resolve against (e.g.) GRSciColl. Upshot is that I basically agree with Nick's suggestion above except that I would not call it ala:institutionCode, I'd just call it ala:institutionName because the processing is changing the type of data in that field. Same comments also apply for collectionCode.

nielsklazenga commented 4 years ago

What Ely just said...

'Eurobodalla Regional Botanic Garden' and 'Wallace herbarium' are not processed institutionCode and collectionCode, but institution and collection names, i.e. something different. Looking into the Collectory to find the institution and collection that belong with the codes is inference and is more akin to sampling than to processing of the provided data.

nickdos commented 4 years ago

Good points - agree institutionName is a better choice in this case.

We do have cases where the processed and raw are equivalent. E.g. decimalLatitude where record has one value and we then change it via the SDS.

nielsklazenga commented 4 years ago

Yes, and the processed decimalLatitude can be different from the provided one, if it was provided with a different datum than WGS84.

Didn't think I was opening Pandora's box when I submitted this issue.