gbif / portal16

GBIF.org website
https://www.gbif.org
Apache License 2.0
24 stars 15 forks source link

Reduce number of unwanted very large downloads #1931

Open MattBlissett opened 2 months ago

MattBlissett commented 2 months ago

Many of the very largest downloads (>100GB) are requested, but then never downloaded by the user. This is a significant waste of our resources and the user's time, and is probably frustrating for the user when they realize they cannot use GBIF data as they had hoped.

We have popup banners for large and most/all occurrence downloads, but since those were implemented cookie banners have spread all over the web, so users are even less likely to read them. I suggest instead changing the download page itself, providing different options and in some cases removing the existing DWCA and Simple options.

Ideas (more ideas was added later):

CecSve commented 2 months ago

I think the following options are a good suggestions:

Force users to select between TEST_DOWNLOAD | STORE_COPY_FOR_CITATION

It should, however, be clarified what the difference is somehow.

Do not mint DOI's unless the file is downloaded. And delete file if not downloaded within 6 months.

Although it is a good idea, I am not sure how it could be coupled with the above suggestion. How about we define a threshold and make a policy that we do not store data above this threshold for more than XX days/months unless the user actively requests us to?

If the download is extremely large, e.g. more than half the total dataset, direct the user to the existing monthly downloads and the cloud-hosted monthly snapshots. Along with an encouragement to register a derived dataset later

In either case advise the user to add additional filters, perhaps directly. "You might add a filter for a taxon, location or date."

Maybe we could again set a threshold of min. 3-5 filters applied or else this message appears?

Always include information about creating a derived datasets if you do post filtering

Yes, this would be helpful from a helpdesk perspective - it is one of the more common questions that pop up and it would be great if more users could be made aware of the option.

@ahahn-gbif and @timrobertson100 what do you think about the suggestions?