NASA-PDS / deep-archive

PDS Open Archival Information System (OAIS) utilities, including Submission Information Package (SIP) and Archive Information Package (AIP) generators
https://nasa-pds.github.io/deep-archive/
Other
7 stars 4 forks source link

The tool shall include products in the manifests based upon the relationships described in the PDS4 bundle and collection metadata #50

Closed jordanpadams closed 4 years ago

jordanpadams commented 4 years ago

Requirement

The tool shall include products in the manifests based upon the following criteria (see Details below for more detailed information):

  1. Bundle (B) specified as input to the tool and any associated readme (R)
  2. Primary Collections (C1, C2, C3) associated with that Bundle (B)
  3. Primary products associated each of those Collections (C1, C2, C3)

Assumptions

  1. Products include both the label (.xml) and the files referenced from it.
  2. Any file_name referenced in a label can be assumed it is in the same directory as the parent product.
  3. Any collections referenced by LID will only include latest version of associated collections in SIP and AIPs (flag to disable this and include all collections for backwards compatibility).

Details

1. Bundle (B) specified as input to the tool and any associated readme (R)

  1. Bundle XML (included as input to the tool)
  2. Readme's referenced by the bundle (//File_Area_Text/File/file_name) (See Assumption 2 above for where to look)

2. Primary Collections (C1, C2, C3) associated with that Bundle (B)

To identify the primary collections of a bundle get the LIDs/LIDVIDs per:

  1. All//Bundle_Member_Entry/lidvid_reference/ + //Bundle_Member_Entry/member_status/value() == Primary
  2. All //Bundle_Member_Entry/lid_reference/ + //Bundle_Member_Entry/member_status/value() == Primary (see Assumption 3 above)

To find those products, assume any collections referenced by the bundle will be in the same directory or in any sub-directory of the input bundle.

3. Primary products associated with each of those Collections (C1, C2, C3)

From collection labels C1, C2, C3, here is how we can get the product LID/LIDVIDs:

  1. Parse out //File_Area_Inventory/File/file_name/ and //Inventory/field_delimiter/ to prepare to parse the inventory.
  2. Based upon Assumption 2 and the information from step 1, find the file and parse all primary products (line start with a P, not an S (secondary))
  3. Within the collection directory or all sub-directories, include all products based upon the LIDVIDs specified in step 2.