cedadev / cmip6-replication

CMIP6 replication procedures, status and requests at CEDA
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Overview of procedure and documentation #3

Open agstephens opened 3 years ago

agstephens commented 3 years ago

Overview

CEDA manages archives of CMIP6, CORDEX, Obs4MIPs etc. In most cases, we do not hold all the data for these projects. CEDA/JASMIN users can request that we obtain and archive additional datasets from ESGF.

Aim

We need to create, and document, a procedure for the above.

Workflow

The basic user request workflow is:

  1. User contacts support@ceda.ac.uk with a request.
  2. CEDA staff discuss the request and convert it into an appropriate format (the SELECTION_FILE).
  3. Synda is run (maybe as a user with read-only to the DB:
    • identify all datasets that would be replicated (and store as DATA_REQUESTED file).
    • calculate the size of the request
  4. If the request is large (>200GB), then CEDA uses an agreed process for deciding whether to action or reject the request.
  5. CEDA contacts the user to confirm the decision in (4).
  6. CEDA adds the SELECTION_FILE to the synda queue.
  7. At agreed follow-up time(s), CEDA checks whether the DATA_REQUESTED file has been satisfied (i.e. all datasets are replicated, archived and published). - we will need to create some kind of scripts for this.
  8. CEDA contacts the user to confirm the outcome.

Information management and documentation

In order to support this service, we need to provide:

  1. Information on the Help Page about requesting data. Add to: https://help.ceda.ac.uk/article/4801-cmip6-data
  2. Links to that Help Page from the CMIP6 catalogue records.
  3. A public explanation of the replication priorities in a user-friendly format:
    • including ongoing requests/retrievals (e.g. those that will just pick up new models/exps when they are created)
    • including current requests (for specific users and/or projects)
    • including historical requests
  4. Create wiki pages to explain the procedures and documentation

Proposed information management system

HelpScout issues will be used to manage the user interactions:

Public/private sensitivities

Any discussions about a specific request that are required to be kept out of the public domain can take place via the CEDA Helpdesk query.

Content for the HelpScout Response Template

The Response Template can include:

Other issues

agstephens commented 3 years ago

Workflow as agreed by @agstephens @alaniwi @charliepascoe (16/11/2021):

  1. A query comes in (if not via HelpScout (HS) then we ask the user to send it to: support@ceda.ac.uk)
  2. AI/AS responds using HS template response (unless the user has already provided a very clear requirement that does not need further discussion)
  3. User responds via HS
  4. AI/AS converts the request to selection file(s)
  5. AI/AS runs Synda (on Synda machine) using selection file(s) to get an estimate of the volume
  6. If large (>250GB): discuss with AS
  7. If CEDA says too big: tell user and STOP (or agree smaller request)
  8. If the volume is OK: continue
  9. Add the selection file(s) to the appropriate user directory in this repository using the naming convention
  10. Add, commit, and push to GitHub
  11. AI/AS tells user: replication initiated, will review in N days
  12. After N days: AI/AS to review request
  13. If completed: tell user
    • NOTE: we have a script to check whether queries have for a given selection file.
  14. If not yet completed: Go to 11 (unless AI/AS assesses that job will not complete)
  15. Move the selection_file(s) from the user_current to user_historical directory based on the agreed file-naming/directory-naming conventions (#6)