NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Property values returned by the API are inconsistent, as list or single value. #86

Closed tloubrieu-jpl closed 6 months ago

tloubrieu-jpl commented 7 months ago

Checked for duplicates

No - I haven't checked

πŸ› Describe the bug

When I did this request in production:

curl --get 'https://pds.nasa.gov/api/search/1/classes/collections' \
    --data-urlencode 'limit=10' \
    --data-urlencode 'q=(pds:Primary_Result_Summary.pds:processing_level eq "Raw")' \
  --data-urlencode 'fields=pds:Primary_Result_Summary.pds:processing_level' \
--header 'Accept:text/csv'

The response is:

"[Raw, Calibrated]"
"[Raw, Partially Processed, Calibrated, Derived]"
"[Raw]"
"[Raw]"
"[Raw]"
"[Raw]"
"Raw"
"Raw"
"Raw"
"Raw"

The brackets inside the quote should be fixed by as soon as https://github.com/NASA-PDS/registry-api/issues/381 is deployed in production.

But the fact that some values don't have brackets is unexpected since we have registry-sweeper cleaning these data.

Note that the same happens for field example "pds:Time_Coordinates/pds:start_date_time" and I guess a lot of other fields.

πŸ•΅οΈ Expected behavior

I expected [...]

πŸ“œ To Reproduce

1. 2. 3. ...

πŸ–₯ Environment Info

πŸ“š Version of Software Used

No response

🩺 Test Data / Additional context

No response

πŸ¦„ Related requirements

πŸ¦„ #xyz

βš™οΈ Engineering Details

No response

alexdunnjpl commented 7 months ago

@tloubrieu-jpl I'll re-test against the latest imminent version of the API software, as the underlying data is consistent.

curl --get 'https://pds.nasa.gov/api/search/1/classes/collections' --data-urlencode 'limit=10' --data-urlencode 'q=(pds:Primary_Result_Summary.pds:processing_level eq "Raw")' --data-urlencode 'fields=pds:Primary_Result_Summary.pds:processing_level' | jq .data | jq 'map(.properties)' | jq 'map(.lidvid[0], .["pds:Primary_Result_Summary.pds:processing_level"])'

[
  "urn:nasa:pds:mars2020_mastcamz_ops_raw:data::5.0",
  [
    "Raw",
    "Calibrated"
  ],
  "urn:nasa:pds:mars2020_mastcamz_ops_raw:browse::5.0",
  [
    "Raw",
    "Partially Processed",
    "Calibrated",
    "Derived"
  ],
  "urn:nasa:pds:msl_apxs_raw:extras::3.0",
  [
    "Raw"
  ],
  "urn:nasa:pds:lro_diviner_raw:data_raw1::1.0",
  [
    "Raw"
  ],
  "urn:nasa:pds:lro_lola_edr:data_raw::2.0",
  [
    "Raw"
  ],
  "urn:nasa:pds:lro_diviner_raw:data_raw2::3.0",
  [
    "Raw"
  ],
  "urn:nasa:pds:a17hfe_raw_arcsav:document::1.0",
  [
    "Raw"
  ],
  "urn:nasa:pds:a16lsm_raw_arcsav:document::1.0",
  [
    "Raw"
  ],
  "urn:nasa:pds:a12sws_raw_arcsav:document::1.0",
  [
    "Raw"
  ],
  "urn:nasa:pds:a15hfe_raw_arcsav:data::1.0",
  [
    "Raw"
  ]
]

Latest state of main okay, or should I be testing against some particular tag?

alexdunnjpl commented 6 months ago

Example failing product on geo:

urn:nasa:pds:a15side_ccig_raw_arcsav:document::1.0

Running sweepers locally now to check whether this may be due to disabled sweeper.

alexdunnjpl commented 6 months ago

There are products in the geo database having ops:Provenance/ops:registry_sweepers_repairkit_version: 1 but which are unrepaired. It is expected that a manual run will be sufficient to fix this.

Not sure why this would be the case with geo, but if it's necessary to ensure that this is not the case for other nodes, this can be fixed by incrementing the repairkit version to ensure repairkit is run on all products again. If it's okay to spot-fix if this is seen again, this ticket can be closed.

@tloubrieu-jpl @jordanpadams please advise.

alexdunnjpl commented 6 months ago

@jordanpadams @tloubrieu-jpl the issue is in the specification for repairkit behaviour.

Currently, array-enforcement repairs are made to properties matching either of the following regex expressions:

so it's expected that pds:Primary_Result_Summary.pds:processing_level would remain untouched. Can y'all provide a comprehensive list of targets for the array-enforcement repairs?

tloubrieu-jpl commented 6 months ago

@alexdunnjpl @jordanpadams for non ops properties we have a patch in the registry-api which transform single values in lists but that only works on application/json format. See for example:

curl --get 'https://pds.nasa.gov/api/search/1/classes/collections' \ --data-urlencode 'limit=10' \ --data-urlencode 'q=(pds:Primary_Result_Summary.pds:processing_level eq "Raw")' \ --data-urlencode 'fields=pds:Primary_Result_Summary.pds:processing_level' | json_pp

I made that development 2 years ago and I also missed the non json serialization of the results in this patch, reason why it does not work in my examples.

I think I would transfer ticket to registry-api and make the fix there to be consistent with what has already been done for the application/json format.

We can discuss that at the breakout.

alexdunnjpl commented 6 months ago

@tloubrieu-jpl roger that - sounds like there's some requirements to untangle

jordanpadams commented 6 months ago

@tloubrieu-jpl sounds good. I do remember you making that fix, so this all track why we kept running into this even after the fact.