NHSDigital / open-data-standards

This draft of open data CSV standards has been collated and reviewed by the open data cross-system task and finish group, an informal partnership of national bodies in health and care that produce open data and statistics. We are publishing on GitHub to gather feedback and any suggested alterations and additions.
MIT License
1 stars 2 forks source link

Initial reactions #3

Open connor1q opened 2 years ago

connor1q commented 2 years ago

Hi there, Great to see this written down and circulated for comment.

It would be very helpful to include some example data. This could include examples of data that both conforms and breaks these standards. Even a couple of examples will make these standards easier to adopt.

It might also be beneficial to explicitly link to converging resources on open data standards. For example:

canwaf commented 2 years ago

Hi everyone! Just want to drop in and say I am excited for the direction this is heading. ONS Integrated Data Service is working on many interesting projects to help improve data interoperability across government and adopting open data standards (for both public and disclosure data).

One standard we have adopted in the Dissemination branch of IDS is CSV-W, and I think this might be worth investigating. From the proposed standards it looks like a very close fit.

Many points you've covered are excellent activities on making data more accessible, some of my favourites are:

  1. Only include the data
  2. Ensure the data is relevant to the topic it represents
  3. Use common reference data for coded values

I suggest you go further in other respects. Point 12 (Indicate suppression using an additional column instead of a data marker in the value) are incredibly important for ease of use. Might I suggest trying to standardise this and not only including suppression status (or pseudo-suppression through rounding) but also addressing statistical quality markers (i.e. certainty for estimates)?

I'd like to help out, just let me know how.

connor1q commented 2 years ago

Hi @canwaf, Thanks for your input and suggestions. I've passed them along to relevant people internally.

We'll be trying to apply some of this thinking in anger in the coming weeks and would be keen to have your input once we have something concrete.

connor1q commented 2 years ago

Hi @canwaf, I'm trying to figure out a way to get in touch with you or the Integrated Data Service but struggling. Could you please reach out to us at datascience@nhs.net? Thanks Connor

foster999 commented 2 years ago

I think this has quite a lot of overlap with existing cross-government standards for publishing data, so may be worth linking up with the team responsible for these 😄

On point 1: 1 | Ensure data format is both human readable and machine readable.

It may be worth clarifying whether this needs to be met within a single file. Adhering to parts of the accessibility guidance may reduce machine readability, and vice versa. So what is actually required is outputs that are tailored to user needs. I'd suggest that the ideal would be to produce both a formatted OpenDocument Spreadsheet for humans to look at, and a CSV-W version for machines to easily read.

The same GSS team are looking to update the gptables Python package to produce suitable outputs, to reduce the burden on data producers in meeting both needs. It might be worth feeding any requirements from your standards into this work?

connor1q commented 2 years ago

Thanks @foster999 ,

That's really helpful - in particular the gptables package. My team will play around with it to see if it can replace some existing functionality.

The guidance for the open data csv's is intended to supplement the formatted spreadsheets rather than replace them. It is worthwhile emphasising the need to drive outputs based on user needs. I would be very interested in any user research from other departments on the different user needs if you have any leads

connor1q commented 2 years ago

@foster999 - just a second thought about the user centred design. We (NHSD) could dedicate some time to do user testing of the gptables package and feed the outcomes back into your work. There are a number of teams grappling with this topic at the moment so you could get feedback from a range of users with different skill levels

foster999 commented 2 years ago

@rowanhemsi and team maintain gptables now. I think they'd be really greatful for any feedback or support with user research!

rowanhemsi commented 2 years ago

Thanks for tagging me @foster999. We would definitely be grateful for user testing and feedback @connor1q! We are currently working on a new major version of the package - available on the dev branch of our GitHub. This will bring the package in line with the latest GSS guidance on publishing statistics in spreadsheets, in particular the accessibility guidelines. As David mentioned, the accessibility and usability of spreadsheets can conflict with machine readability. Where this happens, we have gone with the human readable option, so would recommend publishing data in CSV format as well as spreadsheets