Open tloubrieu-jpl opened 1 month ago
@tloubrieu-jpl
Besides some strange log messages, not a problem.
The synchronization takes place in opensearch not out of it. The various harvests will write the same fields and type using batch. The first will succeed while the other will get a message back causing the Updated N fields to be smaller. Might be another message about already there but maybe not depending on various factors. Point is, harvest will just press forward with doing its job.
@tloubrieu-jpl thoughts?
@al-niessner , the json files are not downloaded before its content is processed ? If they are downloaded is there a risk that 2 harvest process download the LDD files at the same location ? So that one process tries to write the LDD file while another reads it.
That might not be a problem but I want to confirm that before we can close this ticket.
Harvest downloads the mapping contents at some point. Compares what it sees as new then batches those back up to mappings. It counts only those successful in the batch as being uploaded. That is why it does not matter; the batch uses opensearch batch mappings to use only one of multiple writes. We were testing it earlier before it was fixed because harvest was not reading the mappings correctly and kept sending the whole LDD but then had 0 updates. Hence, we already know it is not a problem.
The log message found when job run in parallel is:
[ERROR] Request failed: [resource_already_exists_exception] Update to the indices [geo-registry] failed due to either concurrent update or deletion of the indices
Hi @al-niessner ,
Harvest need to support running in parallel, can you investigate the error that Dan received (see previous comment) when he did run multiple harvest in parallel and fix it.
Thanks,
Thomas
@tloubrieu-jpl
I need more of the log before that error. The error message above looks like the Java SDK V2 is throwing an exception and maybe more of the log would help me understand where. It is clearly not wanting to create something that is already there, but it may not be with the mapping.
Hi @scholes-ds, could you attach or paste a longer section of the logs for this case so that @al-niessner can understand where the message comes from ?
Thanks
Hi @al-niessner ,
Actually @scholes-ds gave me some context logs before, here they are:
[INFO] Updated 43 fields
[INFO] Processing product \\isilon-pri-data\pds-san\data\lunar\urn-nasa-pds-pioneer89cdd\calibration\p8_p9_cdd_calib_notebook.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: http://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [resource_already_exists_exception] Update to the indices [geo-registry] failed due to either concurrent update or deletion of the indices
Let me know if you need more.
Thanks
@tloubrieu-jpl
This is enough. It looks like it was during the LDD update not push of document - why there is an overwrite flag. I will see if I can duplicate and ignore the error assuming nobody deleted the index.
@al-niessner is having difficulties to reproduce this error.
Until we have a better understanding of that issue, we advise discipline node not to run harvest processes in parallel.
We can create a ticket with AWS to have a better understanding, @sjoshi-jpl ?
💡 Description
I am thinking of a possible writing conflict on the temporary LDD created locally.
⚔️ Parent Epic / Related Tickets
No response