NASA-PDS / operations

Tickets for the PDSEN Operations Team
Other
5 stars 1 forks source link

Sync ESA-PSA label files to a local directory #557

Closed nutjob4life closed 3 weeks ago

nutjob4life commented 1 month ago

🗒️ Summary

Merge this to satisfy the sync part of registry-legacy-solr#135.

⚙️ Test Data and/or Report

$ cat harvest.cfg
cat: harvest.cfg: No such file or directory
$ du -sh download
du: download: No such file or directory
$ .venv/bin/python3 bin/portal/pds-sync-api.py
INFO Writing harvest XML config to harvest.cfg
INFO Downloading labels from https://pds.mcp.nasa.gov/api/search/1/products to download
INFO Generating ESA-PSA products from https://pds.mcp.nasa.gov/api/search/1/products

(25 minutes later)

$ cat harvest.cfg
<?xml version='1.0' encoding='UTF-8'?>
<harvest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://github.com/NASA-PDS/harvest/blob/main/src/main/resources/conf/configuration.xsd">
  <registry auth="/path/to/auth/file">app://localhost.xml</registry>
  <load>
    <directories>
      <path>/Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/operations/download</path>
    </directories>
  </load>
  <fileInfo processDataFiles="true" storeLabels="true">
    <fileRef replacePrefix="/Users/kelly/Documents/Clients/JPL/PDS/Development/nasa-pds/operations/download" with="https://url/to/archive"/>
  </fileInfo>
  <autoGenFields/>
</harvest>
$ du -sh download
6.0M    download
$ find download -type f | wc -l
    1449

♻️ Related Issues

nutjob4life commented 1 month ago

@nutjob4life also, in terms of speed, do we thing narrowing down the fields returned would speed this up at all? Or is most of the overhead in the label downloads?

It's definitely in the downloads. The API responds crisply and efficiently.

jordanpadams commented 3 weeks ago

Merging. Will create separate task to actually run this.