SETI / rms-data-projects

Apache License 2.0
0 stars 1 forks source link

A tool to produce a data index file #22

Closed esimpsons3ti closed 7 months ago

esimpsons3ti commented 1 year ago

This module creates an index file, analogous to the index.tab and related files created by the RMS team for PDS3 volumes. The index file includes a line for each data product and columns for user-specified information found in ("scraped from") the product label.

The tool should be called pds4_create_data_index.py

The specifications are as follows:

  1. Input arguments should be described in comment (or docstring?) as: a. the path containing the bundle.xml (default is current directory) b. the value(s) of reference_type for which you want to create an index (default is bundle_has_data_collection) c. a list of the xpath(s) that you want to turn into columns in your data index (for now hardwire this to be ["pds:Target_Identification"], which matches your current functionality).

  2. In the path designated by input (a), open NOT the bundle.xml but instead the bundle_member_index.csv that is in the same directory (this will have been created by the tool described in https://github.com/SETI/pds-data-projects/issues/20).

  3. Find the entries whose reference_type matches input (b) and note their paths.

  4. In each of those paths, open NOT the collection product but the collection_member_index.csv that is in the same directory (this will have been created by the tool described in https://github.com/SETI/pds-data-projects/issues/21).

  5. For each entry, open the label and scrape it as in the current code (a separate issue will ask for the current hardwired Target_Identification to be replaced with input (c)).

  6. Write results into a file data_index.csv

esimpsons3ti commented 7 months ago

This was finished with the creation of pds4_create_xml_index.py.