gbif / portal16

GBIF.org website
https://www.gbif.org
Apache License 2.0
24 stars 15 forks source link

Prevent duplicate submissions via 'catcher' #329

Open gbif-portal opened 7 years ago

gbif-portal commented 7 years ago

Prevent duplicate submissions via 'catcher'

To prevent duplicate submissions being made via the 'catcher' @dschigel and I propose:

Upon information being entered into either the 'Internet link to data' or 'Bibliographic reference' fields, query the GitHub issues to check for an existing issue having these links/identifiers. The same duplicate check could even be done for the title. In any case, upon finding a potential duplicate the user is warned an existing issue with this link/title already exists.

This will alleviate the load on the person(s) managing incoming issues.

More importantly, this will save users from wasting there time submitting duplicate issues. It has already happened once, and is likely to happen more as the 'catcher' becomes more widely used.


fbitem-650dd53d64421891d161f2147516cbcf45073b95 Reported by: @kbraak System: Firefox 51.0.0 / Mac OS X 10.10.0 Referer: https://demo.gbif.org/tools/suggest-dataset Window size: width 1569 - height 968 API log&_a=(columns:!(request,response,clientip),filters:!(),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499%20AND%20(request:%22%2F%2Fapi.gbif.org%22)')),sort:!('@timestamp',desc))&indexPattern=uat-varnish-&type=histogram) Site log&_a=(columns:!(request,response,clientip),filters:!(),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E399%20AND%20(request:%22%2F%2Fdemo.gbif.org%22)')),sort:!('@timestamp',desc))&indexPattern=uat-varnish-&type=histogram)

MortenHofft commented 7 years ago

quoting @ahahn-gbif

I am not sure whether this would be in any way simpler/easier to add short-term, but maybe it would be enough to support a simple key word search before starting to register a dataset, á la "check whether the dataset you have in mind has already be suggested by someone else", and an autocomplete/search for some key terms that a hopeful submitter might pick from the title and enter, like "Predicts" or "Paris basin" or "Falco"?

MortenHofft commented 7 years ago

how would you rate the impact of this? Is it an issue already or a premonition? Would be nice in general. Also for other portal issues, not sure how to do that though.

MortenHofft commented 7 years ago

Given that there isn't really a problem yet as far as i can see, i will label this as low impact for now. If it turns out that it becomes a problem then please change it again.

37/57 suggested datasets are unvalidated at this point. That part of the workflow should probably be optimized before automatically detecting duplicates and other development tasks.