NASA-PDS / deep-archive

PDS Open Archival Information System (OAIS) utilities, including Submission Information Package (SIP) and Archive Information Package (AIP) generators
https://nasa-pds.github.io/deep-archive/
Other
7 stars 4 forks source link

Transfer manifest mismatch between `pds-deep-archive` and `pds-deep-registry-archive` #158

Closed jordanpadams closed 6 months ago

jordanpadams commented 6 months ago

Checked for duplicates

Yes - I've already checked

šŸ› Describe the bug

When I generate AIPs with both pds-deep-archive and pds-deep-registry-archive, they are not same.

šŸ•µļø Expected behavior

I expected the files to be the same.

šŸ“œ To Reproduce

$ pds-deep-archive --debug -b https://atmos.nmsu.edu/PDS/data/PDS4/LADEE/ -s PDS_ATM test/data/ladee_test/mission_bundle/LADEE_Bundle_1101.xml
...

$ awk '{print length}' ladee_mission_bundle_v1.0_20240208_transfer_manifest_v1.0.tab | uniq -c
  13 511
$ pds-deep-registry-archive -s PDS_GEO urn:nasa:pds:magellan_gxdr::1.0
...

$ awk '{print length}' magellan_gxdr_v1.0_20240208_transfer_manifest_v1.0.tab | uniq -c
 109 512

šŸ–„ Environment Info

MacOSx pds-deep-archive.git@ead418a5b2f7ec943cb821dfd2b948fa9db538e7

šŸ©ŗ Test Data / Additional context

šŸ¦„ Related bugs

Tightly coupled with https://github.com/NASA-PDS/deep-archive/issues/155

āš™ļø Engineering Details

When I update the magellan label from #155, to change record_length to 513 and File Specification Name to 256, it validates successfully, which would indicates the File Specification Name field may be 1 byte too long in the table.

I also tested removing an extra space, and validated successfully:

$ sed -i '' 's/ \r/\r/' magellan_gxdr_v1.0_20240208_transfer_manifest_v1.0.tab

Updated label to have <file_size unit="byte">55808</file_size>

$ validate -t magellan_gxdr_v1.0_20240208_aip_v1.0.xml
...
Summary:

  1 product(s)
  0 error(s)
  0 warning(s)

  Product Validation Summary:
    1          product(s) passed
    0          product(s) failed
    0          product(s) skipped
    1          product(s) total

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped
    0          check(s) total
nutjob4life commented 6 months ago

@jordanpadams hate to be a pill about this, but could you include the command-line invocations for both pds-deep-archive and pds-deep-registry-archive? Was hoping to see those under "To Reproduce" šŸ™

jordanpadams commented 6 months ago

@nutjob4life updated

nutjob4life commented 6 months ago

@jordanpadams fantastic, thanks! Here I thought it was dependent on the --base-url (hence I wanted to see the invocations) but it turns out the issue is just the "column size". This is easy enough.

tloubrieu-jpl commented 5 months ago

@gxtchen , you can test by downloading the files in this directory https://atmos.nmsu.edu/PDS/data/PDS4/LADEE/mission_bundle/

gxtchen commented 4 months ago

@jordanpadams @tloubrieu-jpl @nutjob4life I tried with v1.1.5 and v1.2.0. I am still seeing the different numbers. (pds-deeparchive) gxchen@RAYL-C01494 ~/pds/pds4test.build14.1/deep-archive/#158$ awk '{print length}' magellan_gxdr_v1.0_20240412_transfer_manifest_v1.0.tab | uniq -c
109 511 (pds-deeparchive) gxchen@RAYL-C01494 ~/pds/pds4test.build14.1/deep-archive/#158$ awk '{print length}' ladee_mission_bundle_v1.0_20240412_transfer_manifest_v1.0.tab | uniq -c
13 511

nutjob4life commented 4 months ago

In the output pasted into the comment above, I'm seeing 511 and 511, which is correct. Are some different numbers expected?

Here are my reproduction steps (let me know if I did something wrong šŸ˜¬):

$ cd /tmp
$ wget \
    --quiet \
    --execute robots=off \
    --cut-dirs=2 \
    --reject='index.html*' \
    --no-host-directories \
    --mirror \
    --no-parent \
    --relative \
    --timestamping \
    --no-check-certificate \
    --recursive \
    https://atmos.nmsu.edu/PDS/data/PDS4/LADEE/mission_bundle/
$ python3.9 -m venv 1.1.5
$ cd 1.1.5
$ bin/pip install --quiet --upgrade pip pds.deeparchive==1.1.5
$ bin/pds-deep-archive --version
pds-deep-archive 1.1.5
$ bin/pds-deep-archive --quiet --bundle-base-url https://atmos.nmsu.edu/PDS/data/PDS4/LADEE/ --site PDS_ATM ../PDS4/LADEE/mission_bundle/LADEE_Bundle_1101.xml
$ awk '{print length}' ladee_mission_bundle_v1.0_*_transfer_manifest_v1.0.tab | uniq -c
  13 511
$ echo "511 columns is correct"
511 columns is correct 
$ bin/pds-deep-registry-archive --version
pds-deep-reigstry-archive 1.1.5
$ bin/pds-deep-registry-archive --quiet --site PDS_GEO urn:nasa:pds:magellan_gxdr::1.0
$ awk '{print length}' magellan_gxdr_v1.0_*_transfer_manifest_v1.0.tab | uniq -c
  109 511
$ echo "511 columns is correct"
511 columns is correct 
$ cd ..
$ git clone https://github.com/NASA-PDS/deep-archive.git 1.2.0
$ python3.9 -m venv 1.2.0
$ cd 1.2.0
$ bin/pip install --quiet --upgrade pip
$ bin/pip install --quiet --editable .
$ bin/pds-deep-archive --version
pds-deep-archive 1.2.0
$ bin/pds-deep-archive --quiet --bundle-base-url https://atmos.nmsu.edu/PDS/data/PDS4/LADEE/ --site PDS_ATM ../PDS4/LADEE/mission_bundle/LADEE_Bundle_1101.xml
$ awk '{print length}' ladee_mission_bundle_v1.0_*_transfer_manifest_v1.0.tab | uniq -c
  13 511
$ echo "511 columns is correct"
511 columns is correct 
$ bin/pds-deep-registry-archive --version
pds-deep-reigstry-archive 1.2.0
$ bin/pds-deep-registry-archive --quiet --site PDS_GEO urn:nasa:pds:magellan_gxdr::1.0
$ awk '{print length}' magellan_gxdr_v1.0_*_transfer_manifest_v1.0.tab | uniq -c
  109 511
$ echo "511 columns is correct"
gxtchen commented 4 months ago

@nutjob4life you are right, I miss read Jordan's original post, he got 511 and 512. All good, thanks.

nutjob4life commented 4 months ago

@gxtchen whew! Thanks for confirming. Was worried I was losing my mind for a bit there šŸ¤Ŗ