co-cddo / open-standards

Collaboration space for discussing and exploring technical and data standards
134 stars 18 forks source link

Define .xlsx as an open standard. #69

Closed thomasforth closed 5 years ago

thomasforth commented 5 years ago

Define .xlsx as an open standard.

Title

.xlsx (the default file format in versions of Microsoft Excel since 2007) is an open standard (ECMA-376, ISO/IEC 29500) but is currently listed as a closed standard in The Open Government Standards. It should be listed as an open standard.

Category

Challenge Owner

I am @thomasforth, Head of Data at The Data City, imactivate, and ODILeeds.

Short Description

The current Open Government Standards recommend Open Document Format (ODF) as the default standard for editable documents. In the case of tabular data this will be spreadsheets in the ODS format.

Spreadsheets in the Office Open XML (.xlsx) standard are considered to be a closed format by the Open Government Standards, despite being registered as open formats with respected organisations (ECMA-376, ISO/IEC 29500), and being widely supported.

The Open Government Standards should be amended to include .xlsx format spreadsheets as an open standard. This should be either in addition to .ods format spreadsheet or as a replacement if simplicity is preferred.

User Need

A core principle of user-centred design is that it should avoid "forcing the users to change their behaviour to accommodate the product". Most users of spreadsheets currently work with files in the .xlsx format, either in Microsoft Excel or in spreadsheet software compatible with this format.

The imposition of the less widely-used format .ods forces users to change their behaviour, decreases their confidence in using and publishing government datasets, and makes their jobs harder. It is increasingly a barrier to wider publishing of open data in the public sector, particularly in local government where The Open Government Standards guidance applies via The Local Gov Digital standard. It is also a barrier to use of open data, as people are less confident about what to do with the file once downloaded, and discouraged in their analysis by unfamiliar warnings and feature limitations when working with it.

Users need data to be publishable and downloadable in a format that they are comfortable using. .xlsx is better than .ods at this. Both are open standards.

Expected Benefits

The two largest benefits will be to,

In both cases, widespread support for the .xlsx format means that it has more advantages over .ods than it presents disadvantages.

Functional Needs

I don't know what this means.

JamesBelchamber commented 5 years ago

XLSX is not an open standard. It's true that there is an open standard (OOXML) but the vast majority of XLSX files in circulation don't use that standard, and Excel does not by default save to the standard (indeed, it has compatibility issues with it).

I appreciate that most users don't care about things being open and just want it to work. Software developers are reliably and freely able to import ODF-compliant files into their applications, however they are not able to reliably and freely import all XLSX files.

In the instance you demonstrated in your tweet (linked for visibility), Microsoft could integrate ODF support into PowerBI. That they're using their position in the market to enforce the use of their standards is a good example of why (actual, real) open standards are important and necessary.

earfolds commented 5 years ago

.xlsx isn't a standard, and Microsoft Excel doesn't save ISO/IEC 29500 Strict by default either.

fweng322 commented 5 years ago

What users are "familiar with" is not xlsx format, but Microsoft Excel. I would suggest to distinguish the differences between format and software first.

thomasforth commented 5 years ago

What users are "familiar with" is not xlsx format, but Microsoft Excel. I would suggest to distinguish the differences between format and software first.

I'm not convinced by this. Users may well open these file in Numbers, Tableau, and PowerBI (though Excel is almost certainly the most common choice). I think that it is the .xlsx extension that they are familiar with (although that is of course through its association with Microsoft Excel, just as ods is associated with Libre/Open/Neo Office).

thomasforth commented 5 years ago

I appreciate that most users don't care about things being open and just want it to work.

@JamesBelchamber I find your argument quite convincing -- but it is this sentence that sums up well why I'm still struggling

The top requirement for open standards in The Open Standards principles is that "1. Open standards must meet user needs". The user need is for it to just work. .xlsx is the open standard that just works. The ONS have found this repeatedly in their Digital Discovery and User Research.

There are lots of other great reasons why .ods is preferable to .xlsx -- but if user need is the top priority, I don't see how they outweigh that.

JamesBelchamber commented 5 years ago

XLSX is not an open standard. Therefore, nobody should define it as an open standard.

@thomasforth it seems that your argument in practice is whether the requirement for a standard to be open outweighs the requirement for a standard to best meet (some) user needs. A lot has been written about the value of open standards (and their importance to democratising and preserving open data), with a summary two paragraphs up on The Open Standards principles and the government's own Review of the Evidence.

You should consider open standards to be a much more fundamental user need. Most users need to be able to rely on open data to be accessible after a proprietary standard (XLSX, in this example) has been deprecated or abandoned by the developer, as one of many examples of how open standards apply in practice.

edent commented 5 years ago

I disagree that .xlsx is an open standard.

How do individuals or organisations contribute to the ongoing development of XLSX?

With ODS, the Document Foundation allows people from around the world to join and help improve the standard.

thomasforth commented 5 years ago

How do individuals or organisations contribute to the ongoing development of XLSX?

Having looked more closely, I can see that that .xlsx does not meet the UK Government's Open Standards principles because of this point. I will close the issue, speak with the users who feel .xlsx meets their needs better than .ods and consider what to do next.

It remains my deep fear that we are alienating too many users and potential users currently. The ONS have consistently found the same in their digital alphas and user research, which is why they continue to use .xlsx as their primary file format for sharing open data. But clearly this is a bigger issue than just accepting .xlsx (the OOXML ISO/IEC 29500 standard) as this would not fit in with the current guidance.