NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

--overwrite flag is not respected for <bundles> elements in harvest config #112

Closed alexdunnjpl closed 1 year ago

alexdunnjpl commented 1 year ago

🐛 Describe the bug

When harvesting using a config xml file, harvesting already-registered products/bundle with

<bundles>
    <bundle dir="/nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0" versions="all" />
</bundles>

does not succeed, even when --overwrite is provided. This contrasts with

<directories>
    <path>/nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0</path>
</directories>

which behaves as-expected.

📜 To Reproduce

Steps to reproduce the behavior:

  1. Run harvest to register a bundle and its collections/products, using the <bundles> config element
  2. Repeat execution, using the --overwrite option
  3. Observe that the products are skipped, not overwritten

🕵️ Expected behavior

I expect the --overwrite option to be respected irrespective of the XML element used to target the bundle

📚 Version of Software Used

v3.8.0-SNAPSHOT

🦄 Related requirements

See NASA-PDS/registry #118

⚙️ Engineering Details

alexdunnjpl commented 1 year ago

The way I'd expect harvest to work is that there would be

this way, different enumeration approaches (in this case, <bundles> and <directories>) would be decoupled from everything else, and share a common downstream processing path.

The current situation is that <bundle> leverages HarvestCmd.processBundles and <directories> leverages HarvestCmd.processDirectories (likewise, files and collections have equivalents), and these processing execution paths diverge completely depending on which kind of source they're deriving from.

alexdunnjpl commented 1 year ago

@tloubrieu-jpl @jordanpadams I've pushed a bandaid fix in issue-112-overwrite-bug, but I feel like this speaks to a larger architectural flaw which is worth addressing, but risks dragging me in the weeds and taking a long time due to the scope of changes and my relative unfamiliarity with the harvest codebase.

Your call how you'd like to proceed - my suggestion is to have me merge a PR for the bandaid and open an icebox issue for the larger rework if you agree that it's necessary/desirable.