NASA-PDS / deep-archive

PDS Open Archival Information System (OAIS) utilities, including Submission Information Package (SIP) and Archive Information Package (AIP) generators
https://nasa-pds.github.io/deep-archive/
Other
7 stars 4 forks source link

Canonical path is not processed correctly ('/../ in path) #145

Closed c-suh closed 1 year ago

c-suh commented 1 year ago

Checked for duplicates

Yes - I've already checked

πŸ› Describe the bug

When I ran Validate on the 2 submitted packages from https://github.com/NASA-PDS/operations/issues/373, it failed. Using the vo mawd package as an example, here is the error:

ERROR [error.table.field_value_data_type_mismatch] data object 1, record 11, field 2: Value does not match its data type 'ASCII_File_SpecificationName': The directory spec '/document/../catalog/' does not match the pattern '/?([A-Za-z0-9][A-Za-z0-9-][A-Za-z0-9]/?|[A-Za-z0-9][^-_]/?)'

Looking at the offending transfer manifest tab file, there are paths such as /document/../catalog/dataset.cat.

Looking at the sip tab file, this corresponds to https://pds-atmospheres.nmsu.edu/PDS/data/vo_3001/document/../catalog/dataset.cat.

πŸ•΅οΈ Expected behavior

I expected Deep Archive to massage these URLs into a format acceptable by PDS.

πŸ“œ To Reproduce

  1. Run Validate v3.3.1 on the NSSDCA Delivery Package from https://github.com/NASA-PDS/operations/issues/373
  2. Read the Validate report

πŸ–₯ Environment Info

?

πŸ“š Version of Software Used

?

🩺 Test Data / Additional context

NSSDCA delivery package: https://github.com/NASA-PDS/operations/files/10960033/pdsatm_pack9_20230313.tar.gz

Validate reports:

πŸ¦„ Related requirements

No response

βš™οΈ Engineering Details

No response

nutjob4life commented 1 year ago

@c-suh @jordanpadams is it possible to get the original files from the Atmosphere node (as well as the command they ran) that produced the incorrect SIP? I'd like to first make sure I can reproduce this by running pds-deep-archive against those files and making sure I see /../ in the output.

jordanpadams commented 1 year ago

@nutjob4life if you mean the archive data itself, you can download that from here: https://pds-atmospheres.nmsu.edu/PDS/data/vo_3002/

nutjob4life commented 1 year ago

Thanks @jordanpadams.

FYI for my own reference, here's a command I came up with to mirror an HTTPD index listing and all its files:

wget \
    --cut-dirs=2 \
    --execute robots=off \
    --mirror \
    --no-check-certificate \
    --no-host-directories \
    --no-parent \
    --quiet \
    --recursive \
    --reject='index.html*' \
    --relative \
    --timestamping \
    https://pds-atmospheres.nmsu.edu/PDS/data/vo_3002/
nutjob4life commented 1 year ago

@c-suh @jordanpadams okay so in the source data as provided by the Atmospheres node in data/vo_3002/document/dataset.xml, there's this:

        <Document_Edition>
            <edition_name>ASCII Text</edition_name>
            <language>English</language>
            <files>1</files>
            <Document_File>
                <file_name>../catalog/dataset.cat</file_name> <!--  πŸ‘ˆnotice -->
                <local_identifier>vo_irtm_dataset_cat</local_identifier>
                <document_standard_id>7-Bit ASCII Text</document_standard_id>
            </Document_File>
        </Document_Edition>

Before I fix this, my first question is: is this legal in a PDS label?

jordanpadams commented 1 year ago

@nutjob4life checking with some folks but I am 95% certain that is invalid. Validate Tool does not catch it, but I don't think that means it should be allowed.

tloubrieu-jpl commented 1 year ago

@c-suh do you still have the package with which this error was originally raised ? for I&T @gxtchen would use that to validate that the bug is fixed. Thanks

tloubrieu-jpl commented 1 year ago

@nutjob4life ☝️

nutjob4life commented 1 year ago

@tloubrieu-jpl @c-suh https://jpl.slack.com/archives/GCGR1R3A4/p1680820436039979

c-suh commented 1 year ago

@tloubrieu-jpl there's a link in the description under "Test Data/Additional Context" to the package which showed this error. However, just in case they've replaced it, here is the one I had saved locally: pdsatm_pack9_20230313.tar.gz.