gbif / occurrence

Occurrence store, download, search
Apache License 2.0
22 stars 15 forks source link

Find/make an easy way to prune all images for a dataset #223

Open ManonGros opened 3 years ago

ManonGros commented 3 years ago

This relates to this other issue: https://github.com/gbif/portal-feedback/issues/2972 Where this organization is updating their images every month or so.

In general, it would be great if we would easily prune the images for a dataset.

MattBlissett commented 3 years ago

CC @timrobertson100

The process at the moment requires root access on the Thumbor servers:

1) Find and remove all cached pictures from HBase: Gateway: /home/mblissett/clear-nmr-pics

2) Delete local cached pictures from the Thumbor servers' "result storage" (otherwise it could take about a month to expire): Both Thumbor servers: sudo mv /srv/thumbor/result_storage/default/ /srv/thumbor/result_storage/old && sleep 10 && sudo rm -Rfv /srv/thumbor/result_storage/old | sed -n '0~10000p'

Step 2 removes all locally cached pictures. A previous version of Thumbor structured the local cache according to the URL, so it was easy to find what to remove. The current version hashes the whole query (including URL, requested size, crop etc). It's easier just to drop everything, although it would be possible to calculate the possible hashes.

(I'm not even sure how useful the local disk caches are, when we have Varnish anyway -- we could do with some logging to see how many requests are served from Thumbor's disk cache vs the HBase cache (vs Varnish).)