NASA-PDS / operations

Tickets for the PDSEN Operations Team
Other
5 stars 1 forks source link

[nssdca-delivery] urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0 #476

Open mace-space opened 6 months ago

mace-space commented 6 months ago

Discipline Node Information


Engineering Node Process

See the internal EN process at https://pds-engineering.jpl.nasa.gov/content/nssdca_interface_process

c-suh commented 5 months ago

@mace-space hello and thank you for your submission! Unfortunately, there are a number of errors which must be addressed before this can be posted for the NSSDCA. I am attaching the validation report for your review; please resubmit the updated delivery package after addressing the multiple instances of the 5 errors. Thank you!

Validation report: cassini_uvis_solarocc_beckerjarmak2023_v1.0_20231221-validate.txt

As an additional note, I've noticed that you're using an older version of Validate and highly recommend upgrading to the latest version as it has the latest features and bug-fixes. Thank you!

mace-space commented 4 months ago

Thanks, @c-suh

I updated Validate to the latest version, which I'm glad to have done as it spotted bugs that the older version of Validate missed (and I have re-processed the bundle to correct those table offset byte count errors).

However, after re-running pds-deep-registry-archive, the AIP and SIP remain invalid. Looks like issue #155

The AIP and SIP labels reference an incomplete bundle LIDVID: cassini_uvis_solarocc_beckerjarmak2023::1.1, resulting in errors:

   FAIL: file:/Volumes/pdsdata-admin/data_sandbox/deep_registry/test/cassini_uvis_solarocc_beckerjarmak2023_v1.1_20240201_aip_v1.0.xml
       ERROR  [error.label.schematron]   line 27, 25: The number of colons found in lidvid_reference: (2) is inconsistent with the number expected: (5:7).
       ...
      ...
   FAIL: file:/Volumes/pdsdata-admin/data_sandbox/deep_registry/test/cassini_uvis_solarocc_beckerjarmak2023_v1.1_20240201_sip_v1.0.xml                                         
       ERROR  [error.label.schematron]   line 77, 25: The number of colons found in lidvid_reference: (2) is inconsistent with the number expected: (5:7).
  ...
   ...
c-suh commented 4 months ago

@mace-space that is a great find on the deep archive issue! I concur and hope that the issue will be resolved soon. I will try to notify you here once it is. Thank you!

jordanpadams commented 4 months ago

@c-suh see updated package here: Archive.zip

c-suh commented 4 months ago

@jordanpadams and @mace-space this set has been posted for NSSDCA processing! From tomorrow, you can check the status at https://nssdc.gsfc.nasa.gov/psi/ReportPDS4.jsp using the SIP LID below:

SIP LID:

mace-space commented 4 months ago

Thanks! @C-Suh I checked the status and SIP LIDVID: urn:nasa:pds:system_bundle:product_sip_deep_archive:cassini_uvis_solarocc_beckerjarmak2023_v1.0_20240215::1.0 failed because

Bundle located at https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023//bundle.xml does not match checksum in manifest

I think this is because I updated the bundle (to fix the issue detected by the updated version of Validate) while the pds-deep-registry-archive tool was being patched and therefore the url (https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023) points to v1.1 (rather than v1.0) of the bundle.

Shall I try again using the updated url for v1.0 (https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023_v1.0/)? Then separately run pds-deep-registry-archiveand the steps outlined in the PDS Delivery Checklist on v1.1 of the bundle? Is the process the same when registering another version of the bundle to the deep archive or is there a different process to register an updated bundle?

c-suh commented 4 months ago

Hi @mace-space! Please hold off on re-running the deep-registry-archive tool until a new, non-dev version is released (e.g., higher than v1.1.4).

I believe the process is the same when registering another version of the bundle. When creating this new bundle, however, be sure to increment the version in the VID wherever applicable.

To make sure I'm understanding correctly, would you confirm or correct the following bullet points? Thank you!

mace-space commented 4 months ago

Thanks, @c-suh! I will hold off re-running pds-deep-registry-archive until there's a new non-dev version, and will make sure I increment the version in the VID when it comes to registering v1.1

I'll respond to your points above in bold inline here:

Thanks again for all your help

jordanpadams commented 4 months ago

@mace-space as long as the latest versions with latest paths of each bundle are loaded into the next-gen registry, you should be able to just run pds-deep-registry-archive with each of their applicable LIDVIDs, and get the 2 accurate SIP packages:

$ pds-deep-registry-archive --site PDS_RNG urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.1

$ pds-deep-registry-archive --site PDS_RNG urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0
jordanpadams commented 4 months ago

@mace-space also, you should be able to upgrade your Deep Archive software and continue delivering SIP packages. Let us know if you run into any additional issues.

mace-space commented 4 months ago

Thanks, here's the delivery for both v1.0 and v1.1 of urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023 :

NOTE: There were invalid urls in cassini_uvis_solarocc_beckerjarmak2023_v1.0_20240228_sip_v1.0.tab (https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023//), which I corrected to https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023_v1.0//

NOTE: As described previously, v1.0 fails validation because of table offset byte count errors that were only flagged by v3.4.1 of Validate (passed validation using older version of Validate). Would you still want v1.0 included, despite it failing validation with v3.4.1?

Let me know if you have any questions or concerns

c-suh commented 3 months ago

@jordanpadams and @smclaughlin7, passing Mia's question to you:

NOTE: As described previously, v1.0 fails validation because of table offset byte count errors that were only flagged by v3.4.1 of Validate (passed validation using older version of Validate). Would you still want v1.0 included, despite it failing validation with v3.4.1?

The validation report in case it might be helpful.


In the meantime, @mace-space, the v1.1 set has been posted for NSSDCA processing! From tomorrow, you can check the status at https://nssdc.gsfc.nasa.gov/psi/ReportPDS4.jsp using the SIP LID below:

SIP LID:

jordanpadams commented 3 months ago

NOTE: As described previously, v1.0 fails validation because of table offset byte count errors that were only flagged by v3.4.1 of Validate (passed validation using older version of Validate). Would you still want v1.0 included, despite it failing validation with v3.4.1?

@mace-space I would say yes. if the data went online, to ensure provenance of the data in the archive, even if it had some issues with it, it should go to the NSSDCA.

c-suh commented 3 months ago

Note: since posting of v1.0 is to ensure provenance of the data, I am ignoring both errors found in the node's validation report (error.table.missing_LF) and in the EN's validation report (error.label.filesize_mismatch).


@mace-space the v1.0 set has also been posted for NSSDCA processing! From tomorrow, you can check the status at https://nssdc.gsfc.nasa.gov/psi/ReportPDS4.jsp using the SIP LID below:

SIP LID:

mace-space commented 3 months ago

Thanks! v1.1 is in Pre-Ingest stage (some remarks about Context_Area, context products but seems to be progressing OK).

However, v1.0 is still reporting an error:

SIP LIDVID: urn:nasa:pds:system_bundle:product_sip_deep_archive:cassini_uvis_solarocc_beckerjarmak2023_v1.0_20240228::1.0

Node: PDS_RNG

Received: 2024-03-09 Failed: 2024-03-09

Remarks: Manifest checksum calculated does not match manifest checksum in SIP.

I think I need to do a similar thing as for #490's Vgr2 NSSDCA submission and re-load the data into the registry with the correct URL (https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023_v1.0 for v1.0 of this bundle), and then re-run the deep archive software?

When I run :

curl -u username 'https://search-rms-prod-etcetcetc.us-west-2.es.amazonaws.com/registry/_search?q={_id:"urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0"}' | json_pp

it lists ops:Label_File_Info/ops:file_ref and ops:Data_File_Info/ops:file_ref with the v1.1 URL ( https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023) instead of the v1.0 URL ( https://pds-rings.seti.org/pds4/bundles/cassini_uvis_solarocc_beckerjarmak2023_v1.0)

c-suh commented 3 months ago

@mace-space correct, as you so neatly recapped above and did for Vgr2 in #490. Thank you!

mace-space commented 3 months ago

Please find v1.0 with corrected URL: cassini_uvis_solarocc_beckerjarmak2023_v1.0_NSSDCA_20240313.tar.gz

matthewtiscareno commented 3 months ago

@jordanpadams and @c-suh: I wonder if there might be a larger issue here.

Whenever we archive what is the current version at the time, the URL includes the bundle name with no version number appended (e.g., pds4/bundles/cooldata). However, when that version is superseded, the new current version takes that same URL, while the previous version now has the same URL with its version number appended (e.g., pds4/bundles/cooldata_v1.0). This reflects how we have always managed versioning under PDS3.

Will this always require that we re-ingest any bundle at the time that it is superseded? If so, should we change our practice, so that this isn't required? Or could EN tools change so that this is no longer required? Do other nodes do things differently?

One solution might be that pds4/bundles/cooldata_v1.0 already exists even when it is the current version, and either that or pds4/bundles/cooldata is an alias pointing to the other. Please let us know what you think.

jordanpadams commented 3 months ago

@matthewtiscareno a few other nodes encounter this issue as well, and there is a new requirement for the registry to provide some sort of utility to allow a node to update the data path to a file, versus requiring a reload of the products to get the correct paths. https://github.com/NASA-PDS/registry/issues/266. No matter what, it will require some sort of operational intervention to know the file paths have changed, and update the paths in the registry.

From an efficiency perspective, it would be much easier to just put the data online as pds4/bundles/cooldata_v1.0 and pds4/bundles/cooldata_v2.0 from the start, and then just load the data as the new versions come online and that is it. This would require no manual intervention of movement of files, and would decrease overhead over time. That being said, we understand that some nodes prefer "clean" archive directories that include only the latest versions of data products. So we will need to implement some sort of utility. We also hope to avoid the need to do this down the road by providing some web app using the registry to drive "directory views" of pages, so we can obfuscate those old versions of the users unless they want to see them.

Happy to talk more about this or we can discuss at the SWG on Wednesday.

c-suh commented 3 months ago

@mace-space the corrected package from your comment has a validate error. Please review this report for details. Thank you!

mace-space commented 3 months ago

Thanks @c-suh. Sorry to have missed this. It appears that the validate error may be due to extra slashes in the filepaths (field 2) from record 3 onwards and this is causing validate to interpret it as a null field. Do you know what might be causing the additional slash?

 urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0    /bundle.xml                                                                                                                                                                                                                                                                                                                                                                                                                                  
 urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0    /readme.txt                                                                                                                                                                                                                                                         
 urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023:data::2.0 //collection_data.csv                                                                                                                                                                                                                                               
 urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023:data::2.0 //collection_data.xml                                                                                                          
 urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023:data:uvis_euv_2006_257_solar_time_series_ingress::1.1 //uvis_euv_2006_257_solar_time_series_ingress.xml  
  ...

(I had to delete a lot of whitespace between field 1 and 2 to get it to display here)

I ran the pds-deep-registry-archive tool in the same manner as for other bundles previously submitted to NSSDCA, but I'm wondering if I somehow introduced this error? I'm using v1.1.5 of pds-deep-registry-archive

It also appears that the VIDs are wrong – 2.0 and 1.1, instead of 1.0

jordanpadams commented 3 months ago

@mace-space apologies here. this is another bug in our software. we are investigating and will get back to you here.

jordanpadams commented 3 months ago
$ pds-deep-registry-archive -s PDS_RNG urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.1
mace-space commented 3 months ago

% pds-deep-registry-archive --site PDS_RNG urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0 Thanks for looking into this