gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 27 forks source link

LA pipelines configuration and solr schemas should be updated #1038

Open vjrj opened 4 months ago

vjrj commented 4 months ago

I'm facing different issues trying to run the latest versions of pipelines in our LA portal (like issues with using a new name index, or missing fields in solr).

Things I've detected:

Trying to find a correct more up-to-date la-pipelines.yaml I played with this patched configuration assuming that the -emr one is the more up-to-date version (but I'm not sure). Although this was not enough to solve, for instance, my name index issue.

la-pipelines.yaml.patch.txt

TIA

adam-collins commented 4 months ago

livingatlas/solr/conf/ is the source. Ideally there will not be copies elsewhere to maintain. Can you please list all of the missing fields so they can be investigated.

I am not familiar with the config yaml files but do note a surprisingly large number of differences. Some of the differences do not appear straight forward and I do not expect it to be resolved quickly.

vjrj commented 4 months ago

Hi @adam-collins , thanks for the fast response.

livingatlas/solr/conf/ is the source. Ideally there will not be copies elsewhere to maintain. Can you please list all of the missing fields so they can be investigated.

About solr config: Good to know that this is the source. I compared both schemas after _nest_parent_ was missing in my side, probably because I create the schema via ala-install some months ago. But as I found other differences in these schemas, and as the Helm part is quite updated, I was not sure about what schema and solr conf should be used.

About configs: I just compared livingatlas/configs/la-pipelines.yaml and livingatlas/configs/la-pipelines-emr.yaml assuming that the emr one is more updated and trying to find a working patch. It will be great if you only use a single reference config, but probably the emr one has a different format to inject their variables. Anyway, both configurations are quite different.