ImperialCollegeLondon / safedata_validator

Python tools to validate and publish datasets using the safedata metadata format.
https://safedata-validator.readthedocs.io/
MIT License
2 stars 4 forks source link

Issues with command line workflow #137

Closed davidorme closed 2 months ago

davidorme commented 2 months ago

Some thoughts from running through the publication process when updating the description:

  1. The record ID number

    For shell scripting, we have the issue that all the paths involve a Zenodo JSON file with an unknown ID, so the user has to change a bunch of filenames.

    # Publish the dataset to Zenodo
    # 1) Create a new deposit, which will generate a deposit metadata file called
    #    something like zenodo_1143714.json
    safedata_zenodo create_deposit

    If we add an option to create_deposit to write that ID to stdout, the script can then do:

    ​RECID=$(safedata_zenodo -r config.cfg create_deposit --id-to-stdout)

    And then the rest of the script can use that:

    # 2) Upload the dataset file and external files named in the dataset
    #    summary. This uses the zenodo metadata file to confirm the upload destination.
    safedata_zenodo upload_file zenodo_${RECID}.json SAFE_dataset.xlsx
    safedata_zenodo upload_file zenodo_${RECID}.json Supplementary_files.zip
  2. Upload files

    Having to run upload_file multiple times is annoying - could upload multiple files. On the other hand the Zenodo API only accepts one file at a time, so the JSON response from Zenodo is better handled a file at a time.

  3. Ensemble script

    Having all those safedata_zenodo endpoints is all very well, but there is one basic use case: upload stuff and publish it. That basically what our example scripts in the docs are. Should we just have a command:

    ​safedata_zenodo create_and_publish TestFile.xlsx -e ExtraFile.zip -e ExtraFile2.zip

    That could (should) revalidate the file to check it passes and then just run everything. Its easier and also the individual commands are continually loading and checking the configuration. There might be a cunning way of passing the configuration between repeated safedata_zenodo calls, but not sure it is worth it.

  4. Configuration file location

    If you are playing around with configurations (or want to switch between different projects?) it is a pain to have to set the --resources flag on every command. We could add an environment variable option: if SAFEDATA_VALIDATOR_CONFIG is set then use that path.