NASA-PDS / registry-mgr

Standalone Registry Manager application responsible for managing the PDS Registry (https://github.com/NASA-PDS/registry) schemas and indexes.
https://nasa-pds.github.io/registry
Other
0 stars 2 forks source link

ref_lid_collection error when ingesting data sets #57

Closed jordanpadams closed 1 year ago

jordanpadams commented 1 year ago

๐Ÿ› Describe the bug

image

๐Ÿ“œ To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

๐Ÿ•ต๏ธ Expected behavior

Change archive status for the collection

๐Ÿ“š Version of Software Used

๐Ÿฉบ Test Data / Additional context

๐ŸžScreenshots

๐Ÿ–ฅ System Info


๐Ÿฆ„ Related requirements

โš™๏ธ Engineering Details

Initial guess for this error is because the first product ingested only included 1 collection reference, so harvest/registry manager created the schema as a string value instead of an array.

Duplicate of https://github.com/NASA-PDS/registry/issues/118 but keeping open to ensure traceability.

alexdunnjpl commented 1 year ago

~Likely a red herring, but looking at the function which builds the JSON for the update request it doesn't actually build valid JSON (unless the ES client supports multiple newline-separated JSON payloads attached to a single Request - I haven't been able to determine that yet.~

{ "update" : {"_id" : "urn:nasa:pds:epoxi_mri::1.0" } }
{ "doc" : {"ops:Tracking_Meta/ops:archive_status" : "archived"} }

EDIT: TIL about ndJSON

alexdunnjpl commented 1 year ago

@jordanpadams

This may be fixed already.

General guess, or is there a specific release or related issue you're thinking of?

alexdunnjpl commented 1 year ago

Bug appears to be triggering due to the relevant bundle containing only one collection.

The response returned by the API request during es.dao.ProductDao.getCollectionIds() is as follows:

{
  "_index": "registry",
  "_type": "_doc",
  "_id": "urn:nasa:pds:epoxi_mri::1.0",
  "_version": 2,
  "_seq_no": 27012,
  "_primary_term": 3,
  "found": true,
  "_source": {
    "ref_lid_collection": "urn:nasa:pds:epoxi_mri:hartley2_photometry"
  }
}

with ref_lid_collection being parsed by es.dao.DaoUtils.parseSet(), which expects to encounter an array-like JSON string (as opposed to the unenclosed string).

@jordanpadams this appears to be the fault of the API. Will follow up when I identify the relevant next steps.

alexdunnjpl commented 1 year ago

Confirmed that the same request for a bundle with multiple collections does not have this issue and correctly returns an array of strings.

{
  "_index": "registry",
  "_type": "_doc",
  "_id": "urn:nasa:pds:dart_teleobs::1.0",
  "_version": 1,
  "_seq_no": 24385,
  "_primary_term": 3,
  "found": true,
  "_source": {
    "ref_lidvid_collection": [
      "urn:nasa:pds:dart_teleobs:data_ldtcal::1.0",
      "urn:nasa:pds:dart_teleobs:data_ldtddp::1.0",
      "urn:nasa:pds:dart_teleobs:data_ldtraw::1.0",
      "urn:nasa:pds:dart_teleobs:document_ldt::1.0"
    ],
    "ref_lid_collection": [
      "urn:nasa:pds:dart_teleobs:data_ldtcal",
      "urn:nasa:pds:dart_teleobs:data_ldtddp",
      "urn:nasa:pds:dart_teleobs:data_ldtraw",
      "urn:nasa:pds:dart_teleobs:document_ldt"
    ]
  }
}
alexdunnjpl commented 1 year ago

@jordanpadams @al-niessner @jimmie @nutjob4life is it registry-loader that is responsible for constructing the docs to be loaded into the registry? (inb4 yeah bro, it's in the name)

jordanpadams commented 1 year ago

@alexdunnjpl Harvest is the tool for doing this. we discussed merging registry-mgr and harvest into that registry-loader repo, but it has not happened.

alexdunnjpl commented 1 year ago

Thanks Jordan - should be able to continue with this now.

jordanpadams commented 1 year ago

Oops! Nevermind. deleted previous comment because I was definitely wrong. Hopefully the harvest fix will make it work.

alexdunnjpl commented 1 year ago

Closing as this is a registry-common issue.

A corresponding issue has been opened in registry-common