gbif / rs.gbif.org

GBIF machine-readable resources
https://rs.gbif.org
11 stars 13 forks source link

Add authormap.txt to `dictionaries/authority` #138

Open djtfmartin opened 6 days ago

djtfmartin commented 6 days ago

The gbif/checklistbank and the ported version matching-ws rely on a number of dictionaries which are loaded at startup time

I suggest we add the author map file used by these services to rs.gbif.org for completeness. This avoids embedding a version of this file in docker images.

mdoering commented 5 days ago

Great. I just modified the file format slightly to support unrestricted number of authorship variations and have the normed value in the first row. https://github.com/CatalogueOfLife/backend/commit/744fb28b940549aac13a5a04c37aaf772600d907

Should we then use that new version for rs.gbif.org? We can add more entries while not breaking deployed code, but we cannot change the format like this commit did.

On the other hand I do wonder if we need any of these files to live externally and whether we should maybe bundle them all only as java resources. They do not change often, are required for stable test outcomes and you would probably still want a local copy in case rs.gbif.org is unreachable?

djtfmartin commented 5 days ago

Perhaps we cankeep the external files and have fall back local copies for redundancy ?

I thought the external files would be useful to allow Living atlases to use matching-ws and / or docker images.

mdoering commented 5 days ago

All of the other rs.gbif.org dictionary files are from the old code base and we don't use any of these in the new code. The more I think about it I would rather keep all dicts as resources in the new code. There are plenty of them already, e.g. lots of parser dicts. What was the reasoning for using the online for the docker images? So that content can change without the need to rebuild an image? The content hasn't changed in years and the code itself changes much quicker.

djtfmartin commented 5 days ago

If ALA or NBN for example used the generated images for their own contexts, they might find it useful to tweak files like blacklisted.txt.

mdoering commented 5 days ago

Ah yes. I forgot that your ported code still uses some of the old dicts. What about bundling defaults and allow to override them in configs with a URL to some preferred file?

djtfmartin commented 5 days ago

Yes, i think that makes sense. Ill do that.