co-analysis / a11ytables

R package: generate best-practice stats spreadsheets for publication
https://co-analysis.github.io/a11ytables/
Other
42 stars 3 forks source link

YAML for table specification #65

Open mattkerlogue opened 2 years ago

mattkerlogue commented 2 years ago

Consider using YAML files to guide table construction.

title: The 'mtcars' Demo Dataset
description: >
  Aspects of automobile design and performance.
properties: >
  Suppressed values are replaced with the value ['c'].

  Blank cells in the 'Notes' column indicate the absence of a note.
contact: >
    The mtcars Team, telephone: 012 3456 789

tables:
  - name: Table 1
    title: Car Road Tests 1
    source: Motor Trend (1974)
    file: table1.csv
  - name: Table 2
    title: Car Road Tests 2
    source: Motor Trend (1974)
    file: table2.csv

notes:
  - number: 1
    description: US gallons
  - number: 2
    description: >
      Retained to enable comparisons with previous analyses.

Processing steps:

  1. The cover page can be generated from the head elements, title, description, properties, contact.
  2. The contents can be generated from the information in the tables list (using the name and title components) and the existence (or otherwise) of the notes list.
  3. The notes list can be coerced into a table.
  4. The file property of a tables list gives the file that the tables come from, if preferable you can have function to read and check these conform, potentially only allow specific formats (e.g. only CSV/RDS).

yaml::read_yaml() will read a YAML file and return an R list.

matt-dray commented 12 months ago

I forgot this issue existed and I began toying with the idea of YAML input in a branch of matt-dray/a11ytables2.

Example YAML I was using for testing purposes:

cover:
  sheet_title: Widget production in England, season 2023/2024  # mandatory main sheet title in cell A1
  "About this publication":
    - This publication is about the quantity of widgets.  # arbitrary section in form 'section header: text content'
    - This is a second row of information.
  "Period covered":
    - The time period covered by this publication is quarter 3, 2023.   # arbitrary section
  Contact:
    - You can contact the team via email.
    - "[example@example.com](mailto:example@example.com)"   # arbitrary section, use Markdown to indicate a link
contents:
  sheet_title: Contents
  links: true  # whether to add a column with links to each tab
notes:
  sheet_title: Notes  # mandatory expected
  data: widget_notes  # mandatory expected
table_1:
  sheet_title: "Table 1: Widget quantity"  # mandatory expected
  data: data/widget_quantity.csv  # mandatory expected
  source: The UK Widget Survey.  # optional expected
  blanks: Blank cells indicate that data is missing.  # optional arbitrary
  coverage: The data are for the North and South of England  # optional arbitrary
table_2:
  sheet_title: "Tables 2a and 2b: Widget quantity by geography"
  source: The UK Widget Survey.
  table_2a:
    table_title: "Table 2a: Widget quantity produced in the North of England"
    data: data/widget_quantity_north.csv
  table_2b:
    table_title: "Table 2a: Widget quantity produced in the South of England"
    data: widget_quantity_south

I don't think there's anything surprising in this. Note that the data values will be interpreted as files if they have an extension; otherwise as environment objects. Note that YAML input will make arbitrary pre-table metadata' easier to insert (#74) and make it easier to specify multiple tables per sheet (#3).