NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
15 stars 11 forks source link

Cutover to using new Registry API for generating context products json #675

Open jordanpadams opened 11 months ago

jordanpadams commented 11 months ago

💡 Description

Update -u flag functionality to query API instead of legacy registry.

al-niessner commented 10 months ago

@jordanpadams

Not getting any traction here. I see that it is doing this query of the old system which I know nothing about:

product_class:Product_Context AND -data_class:Resource AND -data_class:PDS_Affiliate

I cannot find data_class in our opensearch documents. Then it looks like it uses data_product_type for doing fancy parsing of fields that also may no longer exist. If I had the mapping between old fields and new, then might be able to make progress.

al-niessner commented 10 months ago

@jordanpadams @tloubrieu-jpl

This might be the field name mapping:

     old                            new
    ===                            ===
target_name               pds:Target_Identification/pds:name
instrument_name           ??
instrument_host_name      ??
resource_name             ??
investigation_name        pds:Investigation_Area/pds:name
target_type               pds:Target_Identification/pds:type
instrument_type           ??
instrument_host_type      ??
resource_type             ??
investigation_type        pds:Investigation_Area/pds:type
facility_name             ??
facility_type             ??
airborne_name             ??
airborne_type             ??

Where ?? means I could not find a compelling substitute in the new documents.

Oh, then data_product_type would then be target, instrument, investigation, airborne, etc. That would match the code at least.

jordanpadams commented 10 months ago

@al-niessner the top-level query is: https://pds.nasa.gov/api/search/1/products?q=product_class%20eq%20%22Product_Context%22

old new
target_name pds:Target.pds:name
instrument_name pds:Instrument.pds:name
instrument_host_name pds:Instrument_Host.pds:name
resource_name pds:Resource.pds:name
investigation_name pds:Investigation.pds:name
target_type pds:Target.pds:type
instrument_type pds:Instrument.pds:type
instrument_host_type pds:Instrument_Host.pds:type
resource_type pds:Resource.pds:type
investigation_type pds:Investigation.pds:type
facility_name pds:Facility.pds:name
facility_type pds:Facility.pds:type
airborne_name pds:Airborne.pds:name
airborne_type pds:Airborne.pds:type
al-niessner commented 10 months ago

@jordanpadams

Thanks. Have not seen Product_Context yet.

jordanpadams commented 10 months ago

@al-niessner most of the labels are out here if you are interested.

al-niessner commented 10 months ago

@jordanpadams

What is the new data_product_type? It is used to look up {data_product_type}_name and then type. I presume it does this so that you do not beat yourself up with long lists of N/A. I grabbed a Product_Context and it does not contain a data_product_type nor anything that looks it.

tloubrieu-jpl commented 10 months ago

@al-niessner has some issues with requests to the API:

al-niessner commented 10 months ago

@jordanpadams @tloubrieu-jpl

Okay, code works but our ideas of what the names are does not. If run with the names in the table above, it find nothing. Looked at one of the context products and found this name 'pds:Target_Identification.pds' it then populate the registered JSON file. So, are we sure about the names in the table previously posted?

al-niessner commented 10 months ago

@jordanpadams @tloubrieu-jpl

I will create the pull request now. I have run it serveral times with limited downloads (1000 products) and it comes up empty with the table from this ticket. If I run curl once with just 100 products it has many. They use the same API call and have the same formatted results. Until pagination is resolved, I am leaving it thus.

jordanpadams commented 4 months ago

Moving this back to icebox since it is not super critical. Will work on this in B15.0 or B15.1