Open peterdesmet opened 1 year ago
Thanks Peter for the suggestion. It sounds like a very sensible and useful feature.
But what would be threshold for records drop? Should it be percentage or number?
I suggest a threshold of 90% (hardcoded), but make it an optional setting when setting up auto-publication. That also leaves room for other options, without making it too complicated. Some of these options should probably not be optional (e.g. source data are missing), but always result in an error.
Enable auto-publication
- [x] Abort when the number of records has dropped by 10%
- [ ] Abort when mapped fields are missing in source data
+1 for support. It would help prevent downstream snafus. The only issue I see here is the secondary need for notification of the abort(s) from the IPT, otherwise an affected dataset may sleep indefinitely in purgatory.
Thanks @dshorthouse Email notification might be a very good idea here.
Email notification would require additional configuration by the administrator — currently the IPT doesn't send any emails.
Having this within the IPT would avoid bad data being published, but having it detected by GBIF would allow easier email notifications and the helpdesk could be involved.
@MattBlissett Actually, IPT does send emails, but not directly and via Registry. There is an option "Click here to contact organisation" and there is a link to send an organization token/password reminder. So we can probably implement that similarly.
Having just been through this with @dshorthouse, I agree that an email notification would be very helpful - especially for those publications that initiated on an automated schedule (I may not see an issue for days otherwise). With nearly 180 resources publishing on a schedule knowing that an event was aborted or that the # of records was reduced (significantly), or both, would be very helpful. I also think it is important to be able to configure who receives these messages from within the IPT. The VertNet IPT, for example, has several admins, but not all need to, or should, received notices like this.
@MattBlissett Actually, IPT does send emails, but not directly and via Registry. There is an option "Click here to contact organisation" and there is a link to send an organization token/password reminder. So we can probably implement that similarly.
I'd be reluctant for us to send emails triggered by systems (external IPTs) which we do not control. We could have IPTs with resources that are broken for months emailing users who don't want those emails (e.g. no longer work on the resource), and that risks GBIF's systems being considered spammy by Google, Microsoft etc.
TBC.
Source files via URL + auto-publication is very useful for automatically publishing an active dataset. We use it for e.g. the following citizen science dataset: https://ipt.inbo.be/resource?r=dieren-planten-natuurpunt-occurrences
It would be useful however, if the IPT offered some options for aborting the auto-publication. The dataset above for example, has an issue in the pipeline, which resulted in far fewer records in the source file. This resulted in the (unintentional) deletion of many records at GBIF.org. It would have been nice if the IPT can detect this and abort the auto-publication.