SETI / rms-data-projects

Apache License 2.0
0 stars 1 forks source link

Create HST Documentation bundle #88

Open matthewtiscareno opened 1 month ago

matthewtiscareno commented 1 month ago

@markshowalter said the following in a Dropbox message while sharing the files on 7/11/24:

I am sharing the complete collection of HST user handbooks via Dropbox. These are the inputs to the long-planned "HST documentation bundle", which needs to be cited by Dave's HST data bundles. For each instrument, there is an instrument handbook and a data handbook. In the specific case of FOS, there is a third document, which is a correction to the "final" data handbook. File names should be self-explanatory. I made them consistent, including versions and dates where available, so we know how to name future versions of the files as they are released by STScI. The files are all in PDF, not PDF/A. I have also included a file AAREADME.txt, which contains some of the details about the provenance of each file. I assume this info should be pasted into the XML labels under a description field of some sort. I leave the production of the labels and v1.0 of the bundle to be assigned by Matt. Maybe it will be a useful learning exercise for someone, because the I don't think the level of effort is enormous. To the best of my knowledge, these files do not have doi's, because STScI apparently doesn't believe in that sort of thing. As a fun challenge to the PDS4 standards, several documents have unknown authors and/or publication dates, as is noted in the AAREADME file. Once the bundle is complete, Dave will be able to incorporate into his pipeline the steps to add the relevant files and their labels to the documents/ collection of each data bundle. Let me know if there are any questions. --Mark

matthewtiscareno commented 1 month ago

Need to complete #87 first.

matthewtiscareno commented 1 month ago

Discussion from team meeting on 7/30/24:

  • Mark has gathered all of the current HST instrument handbooks into a Dropbox so that they can be archived in PDS4 format.
    • By the way, Mark did a lot of things to fix documents. There were bad scans, pages that were upside down, chapters that were missing (he found a different source for the same document online), and more.
    • Matt: Are you capturing these things you did as part of the documentation? Mark: Yes, there is a provenance document.
  • Matt has questions:
    • Is there anything preventing us from simply going ahead and archiving this bundle? Or should it wait until we start to archive PDS4 data from Hubble? Yes, we can go ahead.
    • Should this be a stand-alone bundle? Or are we creating an over-arching HST bundle of which this might be a collection? Each HST observing program is its own bundle, so the documentation should be its own bundle. Matt: Some bundles have data from multiple instruments? Mark: Yes. Matt: So the bundleset needs to be only HST, not instrument-specific.
    • Do all of these handbooks conform to PDF/A? No, so that needs to be done
    • One of the files (FOS-Data-Handbook-correction-2001.webarchive) is not a PDF. What shall we do with this? On a Mac, it takes you to the HST archive online. If you click on that link, you find all the documents.
    • BIGGEST QUESTION: Going forward will we update this bundle every time there is a new handbook version? By the same token, should we create a bundle that starts with v1 of every handbook and successively archive all of the versions? This would cause all of the old versions to be in old versions of the bundle. The alternatives include a) including all old versions somewhere in the present version of the bundle, or b) not including old versions of handbooks at all. Yes, we should update the bundle when there is a new handbook going forward. Current plan is not to preserve old versions. They never take information out of the handbooks, they only update it, so no need for old versions.

Mitch: We should document the STScI version, which will be a different number from the PDS version. Also, have a aareadme for the bundle that tracks for each PDS version what are the STScI versions, and also says "earlier versions may be available from STScI." Also put the info in the modification history for each document. Mark: I would really rather have the version number in the filename but not in the LID.