AtlasOfLivingAustralia / collectory

Metadata registry for the Atlas
https://collections.ala.org.au
Other
0 stars 15 forks source link

IPT contacts sync issues #116

Open vjrj opened 2 years ago

vjrj commented 2 years ago

Currently I see some issues with the contacts sync of IPT resources:

image

Are these issues intentional or PRs are welcome :-) ?

djtfmartin commented 2 years ago

PRs welcome :)

adam-collins commented 8 months ago

Can this be closed?

vjrj commented 8 months ago

I missed this. It's still unresolved

vjrj commented 2 months ago

A comment just to show the effect of this issue: https://collections-test.ala.org.au/contact/show/1029 and why #236 should be merged IMHO.

image

adam-collins commented 2 months ago

Before forwarding to the data team for review, is my following summary of the changes to the IPT service correct?

  1. eml.dataset.contact, when present, is used as the primary contact. This is your data managers suggestion.
  2. Add the contacts found in eml.dataset.associatedParty to the list of contacts. This was not previously done.
  3. Add additional logic to manage the creation of duplicate contacts in the global contacts table. One implication of this is that when a contact (unique by email) has a new first or last name, the first and last name will be updated.
    • First looks for a contact with the same email (electronicMailAddress)
    • When there is no email, looks for a contact with the same first and last name (individualName.givenName, individualName.surName)
    • When there is no email or first name, looks for a contact with only the same last name.
    • Otherwise, create a new contact.
  4. Add additional logic to prevent the creation of duplicate contacts for a data provider to prevent multiple contacts having the same first and last name.
vjrj commented 2 months ago

Although this was detected using the IPT service, the contacts process code of an eml is something common and used in other parts of the collectory.

About primary contacts: IMHO It's something that we add to the db (for some reason) and I maintain, but it's not used in the UI. So I wouldn't worry too much.

More than the case where the first contact without email is incorrect associated with all the drs of other contacts without email: https://collections-test.ala.org.au/contact/show/1029 this PR is tries to follow the EML specification and put more attention on the contacts part, something that from my point of view was not totally developed in our side.

As a result, if you can compare how this contacts part is processed in our datasets and compare with how GBIF does the same process of the same dataset, you'll find that currently our contacts:

and #236 solves all these issues and our contacts part is in this case similar to GBIF one.

adam-collins commented 1 month ago

Merging the PR closed the issue, reopening.