Open jack-davison opened 1 year ago
Thanks for getting in touch, Jack. This is a great question and I like the idea.
We'd need to detect whether a given spreadsheet was created:
At simplest, we could detect the tabs containing tables—or provide an argument to users to specify them—and then extract the table(s) alone. however, it might be useful to output a list object where each element represents a tab and we have elements for the title, presence of notes, the table(s), etc.
Related: in case you haven't seen them before, our colleague Duncan (@nacnudus) has written some great packages, {tidyxl} and {unpivotr}, for general-purpose spreadsheet parsing and wrangling. There's an associated online book too.
Another approach: use Fran's package {odsTableReadr} to identify each of the tables in such a workbook and then extract meta-information from cell A1 to the row just above the start of a table.
Hi Matt,
Thanks very much for this package. I saw your talk at EARL2022 on
{a11ytables}
but didn't get the opportunity to ask the following.We often use UK government statistics in the format described and, while the format is accessible in a spreadsheet, it can be painful to read it into R for analysis (data doesn't always start on the same line, there can be multiple independent data tables per sheet, multiple sheets per file, etc.). Often a more code-friendly/tidy format (e.g., a simple csv) just isn't available, so we've had to DIY solutions to iterate over different sheets, detect where the data is when there are multiple tables per sheet, and so on.
Is it on the roadmap to write a
read_allytable()
function to do the inverse of the current package functionality, i.e., take an accessible spreadsheet saved locally and turn it back into a list of tidy tibbles in R?Cheers, Jack