This module creates an index file, analogous to the index.tab and related files created by the RMS team for PDS3 volumes. The index file includes a line for each data product and columns for user-specified information found in ("scraped from") the product label.
The tool should be called pds4_create_data_index.py
The specifications are as follows:
Input arguments should be described in comment (or docstring?) as:
a. the path containing the bundle.xml (default is current directory)
b. the value(s) of reference_type for which you want to create an index (default is bundle_has_data_collection)
c. a list of the xpath(s) that you want to turn into columns in your data index (for now hardwire this to be
["pds:Target_Identification"], which matches your current functionality).
In the path designated by input (a), open NOT the bundle.xml but instead the bundle_member_index.csv that is in the same directory (this will have been created by the tool described in https://github.com/SETI/pds-data-projects/issues/20).
Find the entries whose reference_type matches input (b) and note their paths.
In each of those paths, open NOT the collection product but the collection_member_index.csv that is in the same directory (this will have been created by the tool described in https://github.com/SETI/pds-data-projects/issues/21).
For each entry, open the label and scrape it as in the current code (a separate issue will ask for the current hardwired Target_Identification to be replaced with input (c)).
This module creates an index file, analogous to the index.tab and related files created by the RMS team for PDS3 volumes. The index file includes a line for each data product and columns for user-specified information found in ("scraped from") the product label.
The tool should be called
pds4_create_data_index.py
The specifications are as follows:
Input arguments should be described in comment (or docstring?) as: a. the path containing the
bundle.xml
(default is current directory) b. the value(s) ofreference_type
for which you want to create an index (default isbundle_has_data_collection
) c. a list of the xpath(s) that you want to turn into columns in your data index (for now hardwire this to be["pds:Target_Identification"]
, which matches your current functionality).In the path designated by input (a), open NOT the
bundle.xml
but instead thebundle_member_index.csv
that is in the same directory (this will have been created by the tool described in https://github.com/SETI/pds-data-projects/issues/20).Find the entries whose
reference_type
matches input (b) and note their paths.In each of those paths, open NOT the collection product but the
collection_member_index.csv
that is in the same directory (this will have been created by the tool described in https://github.com/SETI/pds-data-projects/issues/21).For each entry, open the label and scrape it as in the current code (a separate issue will ask for the current hardwired
Target_Identification
to be replaced with input (c)).Write results into a file
data_index.csv