SciCatProject / scicat-filewriter-ingest

Python client that connects to a kafka queue and creates new datasets when receiving notification that a file has been written
0 stars 0 forks source link

Provide better help #76

Open nitrosx opened 2 months ago

nitrosx commented 2 months ago

Currently the following is the help output from the offline ingestor:

usage: scicat_offline_ingestor.py [-h] [--nexus-file NEXUS_FILE] [--done-writing-message-file DONE_WRITING_MESSAGE_FILE] -c CONFIG_FILE [--id ID] [--dataset.check-by-job-id] [--dataset.allow-dataset-pid] [--dataset.generate-dataset-pid]
                                  [--dataset.dataset-pid-prefix DATASET.DATASET_PID_PREFIX] [--dataset.default-instrument-id DATASET.DEFAULT_INSTRUMENT_ID] [--dataset.default-proposal-id DATASET.DEFAULT_PROPOSAL_ID]
                                  [--dataset.default-owner-group DATASET.DEFAULT_OWNER_GROUP] [--dataset.default-access-groups DATASET.DEFAULT_ACCESS_GROUPS [DATASET.DEFAULT_ACCESS_GROUPS ...]] [--ingestion.dry-run]
                                  [--ingestion.offline-ingestor-executable INGESTION.OFFLINE_INGESTOR_EXECUTABLE] [--ingestion.schemas-directory INGESTION.SCHEMAS_DIRECTORY] [--ingestion.file-handling.compute-file-stats]
                                  [--ingestion.file-handling.compute-file-hash] [--ingestion.file-handling.file-hash-algorithm INGESTION.FILE_HANDLING.FILE_HASH_ALGORITHM] [--ingestion.file-handling.save-file-hash]
                                  [--ingestion.file-handling.hash-file-extension INGESTION.FILE_HANDLING.HASH_FILE_EXTENSION] [--ingestion.file-handling.ingestor-files-directory INGESTION.FILE_HANDLING.INGESTOR_FILES_DIRECTORY]
                                  [--ingestion.file-handling.message-to-file] [--ingestion.file-handling.message-file-extension INGESTION.FILE_HANDLING.MESSAGE_FILE_EXTENSION] [--logging.verbose] [--logging.file-log]
                                  [--logging.file-log-base-name LOGGING.FILE_LOG_BASE_NAME] [--logging.file-log-timestamp] [--logging.logging-level LOGGING.LOGGING_LEVEL] [--logging.log-message-prefix LOGGING.LOG_MESSAGE_PREFIX]
                                  [--logging.system-log] [--logging.system-log-facility LOGGING.SYSTEM_LOG_FACILITY] [--logging.graylog] [--logging.graylog-host LOGGING.GRAYLOG_HOST] [--logging.graylog-port LOGGING.GRAYLOG_PORT]
                                  [--logging.graylog-facility LOGGING.GRAYLOG_FACILITY] [--scicat.host SCICAT.HOST] [--scicat.token SCICAT.TOKEN] [--scicat.timeout SCICAT.TIMEOUT] [--scicat.stream] [--scicat.verify]

options:
  -h, --help            show this help message and exit
  --nexus-file NEXUS_FILE
  --done-writing-message-file DONE_WRITING_MESSAGE_FILE
  -c CONFIG_FILE, --config-file CONFIG_FILE
  --id ID

Dataset:
  --dataset.check-by-job-id
  --dataset.allow-dataset-pid
  --dataset.generate-dataset-pid
  --dataset.dataset-pid-prefix DATASET.DATASET_PID_PREFIX
  --dataset.default-instrument-id DATASET.DEFAULT_INSTRUMENT_ID
  --dataset.default-proposal-id DATASET.DEFAULT_PROPOSAL_ID
  --dataset.default-owner-group DATASET.DEFAULT_OWNER_GROUP
  --dataset.default-access-groups DATASET.DEFAULT_ACCESS_GROUPS [DATASET.DEFAULT_ACCESS_GROUPS ...]

Ingestion:
  --ingestion.dry-run
  --ingestion.offline-ingestor-executable INGESTION.OFFLINE_INGESTOR_EXECUTABLE
  --ingestion.schemas-directory INGESTION.SCHEMAS_DIRECTORY

Ingestion File handling:
  --ingestion.file-handling.compute-file-stats
  --ingestion.file-handling.compute-file-hash
  --ingestion.file-handling.file-hash-algorithm INGESTION.FILE_HANDLING.FILE_HASH_ALGORITHM
  --ingestion.file-handling.save-file-hash
  --ingestion.file-handling.hash-file-extension INGESTION.FILE_HANDLING.HASH_FILE_EXTENSION
  --ingestion.file-handling.ingestor-files-directory INGESTION.FILE_HANDLING.INGESTOR_FILES_DIRECTORY
  --ingestion.file-handling.message-to-file
  --ingestion.file-handling.message-file-extension INGESTION.FILE_HANDLING.MESSAGE_FILE_EXTENSION

Logging:
  --logging.verbose
  --logging.file-log
  --logging.file-log-base-name LOGGING.FILE_LOG_BASE_NAME
  --logging.file-log-timestamp
  --logging.logging-level LOGGING.LOGGING_LEVEL
  --logging.log-message-prefix LOGGING.LOG_MESSAGE_PREFIX
  --logging.system-log
  --logging.system-log-facility LOGGING.SYSTEM_LOG_FACILITY
  --logging.graylog
  --logging.graylog-host LOGGING.GRAYLOG_HOST
  --logging.graylog-port LOGGING.GRAYLOG_PORT
  --logging.graylog-facility LOGGING.GRAYLOG_FACILITY

I would like to be able to add better description to each key. For example:

...
Dataset:
  --dataset.check-by-job-id
     Enable checking if a dataset with scientific metadata job_id has already been created. If that is the case, it will not create another entry.
  --dataset.allow-dataset-pid
...

How that can be achieved?