AtlasOfLivingAustralia / image-service

Image repository and tiling services
https://images.ala.org.au
0 stars 17 forks source link

Regex per DR to extract unique ID for the image to prevent re-downloading it #164

Open sadeghim opened 2 years ago

sadeghim commented 2 years ago

Problem: Whenever a major image provider (like iNat) changes a common part of their image URLs like protocol, domain or path causes image-service to start downloading all of them and then compare the hash code of the image with the image DB and find if they are duplicate or not and this can take a significant time from image-service and block the other loads.

Suggestion: Implement groups of regular expressions for each DR that can extract the IDs of images prior to download and match it with the existing image URLs for that data resource in the database. Matched URLs will be added as alternative URL without downloading the images. Non matched URLs will go through and they are needed to be downloaded. This will eliminate the need to download all the images for most of the cases.