NASA-PDS / operations

Tickets for the PDSEN Operations Team
Other
5 stars 1 forks source link

Deployment of Registry Loader Tools and Initial Ingestion of Engineering Node Registry on Latest AWS Deployment #270

Closed tloubrieu-jpl closed 10 months ago

tloubrieu-jpl commented 2 years ago

💡 Description

List of products to ingest:

jordanpadams commented 1 year ago

@rchenatjpl just to verify, do you know how this information is then getting populated if not from those context products? https://pds.nasa.gov/ds-view/pds/viewInstrumentProfile.jsp?INSTRUMENT_ID=ISSW&INSTRUMENT_HOST_ID=VG1

rchenatjpl commented 1 year ago

Wait, you're probably right that that info is coming from the PDS3 context products, though there must be some special handling somewhere to suppress the LID from showing up. Less probable but possible is that that info comes directly from the PDS3 catalog files or the PDS3 database, but that database is so tenuous.

rchenatjpl commented 1 year ago

@jimmie I hope you're the right person, Jimmie. I harvested the files at /data/pds4/system_bundle/, but when I try to set the archive status to archived, I get this output:

[INFO] Setting product status. LIDVID = urn:nasa:pds:system_bundle::1.0, status = archived
[ERROR] [_doc][urn:nasa:pds:system_bundle:product_sip_deep_archive:gbo.ast.catalina.survey_v1.0_20220420_delta_20220908200751165540::1.0]: document missing

I can see that LID in the registry via https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q=lid:*gbo.ast.catalina.survey_v1.0_20220420_delta_20220908200751165540 But I must have something else crossed. Any ideas? Thanks

jimmie commented 1 year ago

@rchenatjpl - Are you pointing to production? That Opensearch URI you provided is production. The Opensearch for en-gamma is here . If I query that, it says the registry index doesn't exist. When you run registry-manager to create the registry schema and populate the data, please be sure your configuration is pointing to the proper Opensearch.

rchenatjpl commented 1 year ago

@jimmie Hi, Jimmie, yes, I am trying to put that stuff into production. I'm tasked with putting 1) the PDS4 context products, 2) the PDS3 context products, and 3) this stuff into production. I think I succeeded with 1. I haven't tried 2 yet because I'm not sure what will happen if there's no bundle.xml or collection*.xml for those. Do you know? And 3, that's this problem. Thanks

jimmie commented 1 year ago

OK thanks @rchenatjpl. Sorry for my misunderstanding of which Opensearch instance you were intending to use.

But looking at things more closely, I'm not sure your initial ingestion worked correctly - if you compare the id of the match that comes back from your Opensearch query with what you're looking for:

in Opensearch: urn:nasa:pds:system_bundle:product_aip:gbo.ast.catalina.survey_v1.0_20220420_delta_20220908200751165540::1.0

your id: urn:nasa:pds:system_bundle:product_sip_deep_archive:gbo.ast.catalina.survey_v1.0_20220420_delta_20220908200751165540::1.0

rchenatjpl commented 1 year ago

@jimmie cr@p. thanks

rchenatjpl commented 1 year ago

@jordanpadams I briefly again doubted whether the context_pds3 files were ingested into the current registry, but now I'm more sure they are because 1) searching for "ISSW" returns some results with no LID (presumably the PDS3) and some with LID (i.e. migrated into a proper PDS4 context product), and 2) search returns data sets, which only exist in u:n:p:context_pds3:...

rchenatjpl commented 1 year ago

@jordanpadams @viviant100 OK, so the PDS3 context products, the PDS4 context products, and the EN-managed documents are ingested and approved. As proof, one example of each: 1) https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q=lid:*context_pds3*instrument*issw.vg1 (Jordan's example) 2) https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q=lid:*dactyl (a target that we didn't have in PDS3) 3) https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q=lid:*dph_1.15.0 (the PDS4 DPH)

Sorry that took so long. Where can I write up the release procedures?

viviant100 commented 1 year ago

@rchenatjpl we have a section here for release procedures: https://wiki.jpl.nasa.gov/display/PDSEN/Data+Release+Procedures. Feel free to use the "Cloud" section underneath for your procedure.

jordanpadams commented 1 year ago

thanks @rchenatjpl per the PDS3 vs PDS4 context products, we probably need to scrub all of these and make sure we do not have duplicates between the PDS3 and PDS4 versions. I assume we have both right now because of the PDS3/PDS4 filter, but for the distinction doesn't really make sense.

viviant100 commented 1 year ago

PDS3/PDS4 context products and EN-managed documents are ingested per @rchenatjpl's comment in Populate EN Registry #61 .

@rchenatjpl please see Jordan's comment above per checking the duplicates between the PDS3 and PDS4 versions in the PDS3 vs PDS4 context products.

rchenatjpl commented 1 year ago

@jordanpadams I may be going about this wrong for what you want. Am I looking for context products that are exactly the same except for u:n:p:context vs u:n:p:context_pds3? There won't be many. Am I looking for all u:n:p:context and u:n:p:context_pds3 that point to the same physical thing? I'll guess so. If so, I think what I should give you is a mapping of the u:n:p:context_pds3 things to the u:n:p:context things. Or something else?

rchenatjpl commented 1 year ago
  1. Ignoring these with no equivalent in the other: u:n:p:context:airborne (1) u:n:p:context_pds3:data_set (2293) u:n:p:context:resource (3: ladee, maven, phoenix) u:n:p:context_pds3:resource (5629) u:n:p:context_pds3:subscription (1541) u:n:p:context_pds3:volume (5974) u:n:p:context_pds3:volumeset (2888)
  2. Ignoring u:n:p:context:attribute (1108) and :class (378)
  3. Ignoring u:n:p:ccontext_pds3:attribute (2184) and :class (89)

That leaves 6769 u:n:p:context_pds3: 2710 u:n:p:context: [was 2963 before removing deprecateds] I'll start on the mapping, though feel free to stop me. Note: u:n:p:context:telescope and the facilities that go with them will be tough because in PDS3, a each facility x telescope was a single insthost, I think.

jordanpadams commented 1 year ago

@rchenatjpl this is great! eventually I think we want to deprecate the "PDS3" version, and have only the PDS4 version available. having both seems like duplicate content and I feel like the system doesn't quite make sense for this. that being said, happy to hear other ideas if we think we should keep both separate. maybe @jshughes or @rsjoyner have thoughts

jordanpadams commented 10 months ago

Closing this as done for the time being until we identify what data is missing and the procedures that need to be updated