Open tloubrieu-jpl opened 2 days ago
Created a clear requirement to tie to this as well:
@jordanpadams @tloubrieu-jpl
Yes, this is to be expected. As discussed previously, moved the node name to be identified through the registry name. This is done here: https://github.com/NASA-PDS/harvest/blob/ae58da7bc6af0dad6c3ed9d4551e3ec2387711ab/src/main/java/gov/nasa/pds/harvest/cfg/ConfigManager.java#L60-L64
It always returns a name. Do you prefer to throw an error when not found in this table? https://github.com/NASA-PDS/harvest/blob/ae58da7bc6af0dad6c3ed9d4551e3ec2387711ab/src/main/java/gov/nasa/pds/harvest/cfg/ConfigManager.java#L21-L34
As you can see from the function, it is mangling your registry name to get the node name since you are not following the convention you gave me. That convention was used to create the connections here: https://github.com/NASA-PDS/registry-common/tree/main/src/main/resources/connections/cognito
We can do a couple of thing if this is a set of integration test nodes (maybe for some users initial review) but not full deployment nodes (ones that follow the original naming rules):
I would suggest 2 and 4. We can and should add more connections, both cognito and direct, to support more registries. We can then add them to the table above as needed with whatever name is desired.
Need answering for outstanding questions
Hi @al-niessner ,
Sorry, after taking a bit more time to think of it, I believe you are right, that does not make sense to keep a hardcoded mapping in the code, because we would need to update the harvest code if we add a new discipline node. Although that might never happen that is an unnecessary hassle.
Besides, I am realizing we don't want to redundantly manage that harvesintg node in harvest since we know who the document has been harvested by from the index in which the document is. In the documents it shows as _index = geo-registry
(although that might not be searchable)
Unless we have cases where someone would like to ingest into an index, e.g. geo-registry but be authored as a different node, for example PDS_SBN or a more specific entity, for example PDS_GEO_MARS ... I don't believe we need that, but I am leaving that decision to @jordanpadams. If we actually need that, we would go for option 4.
If we confirm we don't need that flexibility of having different harvesting nodes in the same index, I would completely remove this updates from the client side, in harvest, and move it servers ide, in sweeper or by an opensearch configuration (I am not finding which though).
@al-niessner , @alexdunnjpl , @jordanpadams , let me know what are your thoughts.
Thanks,
"
Checked for duplicates
Yes - I've already checked
π Describe the bug
In the registry, we can find documents in the *-registry indexes with property ops:Harvest_Info/ops:node_name =
geo
oren
π΅οΈ Expected behavior
I expected to have values PDS_GEO, or PDS_ENG.
π To Reproduce
Harvest data as described in the manual https://docs.google.com/document/d/12W1DyaRYh4yYnw4p_-qFLQ0kmn_w14Sr/edit
π₯ Environment Info
No response
π Version of Software Used
Using harvest v4.0.1
π©Ί Test Data / Additional context
The node value expected are:
π¦ Related requirements
π¦ #187
βοΈ Engineering Details
It sounds related to the change in the code which made the node name in the harvest configuration obsolete.
π Integration & Test
No response