AtlasOfLivingAustralia / biocache-service

Occurrence & mapping webservices
https://biocache-ws.ala.org.au/ws/
Other
9 stars 26 forks source link

Make SpeciesImageService output usable with bie-index #896

Open adam-collins opened 2 months ago

adam-collins commented 2 months ago

To make the biocache-service species autocomplete similar in output bie-index, an image field was included.

It has come to my attention that

  1. This is unnecessary for the autocomplete response as images are not shown in autocomplete drop downs today.
  2. It does not apply the configurable preferred and required image fq filters that bie-index uses.
  3. It does not include the images from the preferred/hidden species lists that contain this information.

From a bie-index perspective, querying biocache-service requires an estimated 1 million requests. Biocache-service accomplishes similar in about a minute. It is reasonable to expect this update to take place every time occurrences change, i.e. daily.

The refactoring of the bie-index can take advantage of the output of an improved SpeciesImageService.

  1. shorter downtime waiting for bie-index index creation
  2. daily updates to bie-index image field instead of weekly (for the biocache-service source, not the lists based sources)
  3. significantly lower traffic to biocache-service during an images update

Currently this information is stored using lft values. It is a goal (#885) to replace these. For this task, a mapping of id:[lft, rgt] must exist. The current process to get this information is a query to namematching-ws with an id. i.e. download the contents of the lucene names index via namematching-ws every time this is required.

Tasks

  1. For SpeciesImageService, add config for preferred (fq list ordered by preference) and required (list of fq) image fqs.
  2. For SpeciesImageService, when fetching images, apply all requiredImageFqs
  3. For SpeciesImageService, when fetching images, apply, one at a time, preferredImageFqs.
  4. Add webservice (or extend suitable existing service) to respond with SpeciesImageService data (SpeciesImagesDTO)
  5. Add webservice (or extend suitable existing service) to respond with SpeciesCountsService data (SpeciesCountDTO)
  6. Look at how lft/rgt is stored in the lucene index and how namematching-ws responds with it. This is with a view to adding a csv file containing id,lft,rgt in archives.ala, beside the lucene index, for use by biocache-service (optional to lower traffic) and the refactored bie-index (because the DwCA does not have this information).
adam-collins commented 2 months ago

now working for atlas-index