DEFRA / software-development-standards

Standards and guidance relating to software development in Defra
https://defra.github.io/software-development-standards/
Other
45 stars 17 forks source link

Data standards #35

Open ben-sagar opened 4 years ago

ben-sagar commented 4 years ago

We should reference (or define) our data standards.

Cruikshanks commented 4 years ago

Think I'd need more detail on what this is actually referring to before I could vote. One to discuss in the next call perhaps.

ben-sagar commented 4 years ago

Could this be a reference to data standards that are elsewhere?

Cruikshanks commented 4 years ago

Could this be a reference to data standards that are elsewhere?

Agree this would be useful so have updated on this basis

nigejohnson commented 4 years ago

Agree this is something where we could do with an "interface" into whatever data standards already exist within DEFRA, but isn't that a service the architecture team should provide to us, especially any application/solution architects working alongside any data architects? I.e. shouldn't our "interface" into the wonderful world of wider DEFRA standards be our "tame" (or not so tame) architects? Otherwise there's a danger of duplicating work or, even worse, diverging standards.

nigejohnson commented 4 years ago

Another thought: aside from data standards meaning such things as which address standards/formats do we support, and what sets of iso codes do we use for this or that (which arguably are standards sets by another part of DEFRA?), what about data protection? Should we have a standard explaining how we should implement things such as GDPR: i.e. what GDPR means for devs? Should data retention policy, currently a separate potential standard, be part and parcel of that more general Data Protection standard?

andrewhick commented 4 years ago

I'm currently drafting a page on the QA wiki which I'm using as a start point: (UPDATE: now moved to add-data-standards branch)

This has the potential to be a huge story so I'd like to start by agreeing what we need to know now, and what are 'enough' standards for now. We can then add more as needed.

nigejohnson commented 4 years ago

That wiki page looks like a very sensible start. Agree this could be vast. Ultimately everything we do is about data! Therefore we should descope to make this manageable and to avoid messy overlaps (so duplicated work and possible contradiction) with other areas. So I'd say: References to the data protection principles we have to follow, and some specific guidance to devs and testers as to what this means in practice... e.g. how we should implement GDPR. Yes also to some data integrity principles... so some statements of principles around backup and recovery (data should be backed up in any situation where the impact of its loss would be greater than the impact of continuing to back it up; back-ups and restoring back-ups should be tested), data retention (so this standard should subsume the other possible Data Retention standard we have listed), and ensuring that the programs we write and databases/data stores we lose don't corrupt data (but high level as there is no point here going into technical detail: such as ensure that data is left in a consistent state and not corrupted when errors occur, for example, where possible, by using data storage mechanisms that support data transactions and rollback, where possible also use data storage mechanisms that preserve data integrity, for example by supporting referential integrity checks and constraints. It might also be worth having a discussion (especially in these unusual times) about what data can be safely left on the cloud, if so with which providers and in which jurisdictions (what is the effect of the B word here?) or (if we want to take that a level up) what principles we should follow when assessing cloud storage options. What data should be kept (either instead of in a third party cloud or as well as, i.e. as back-up) in a govt bunker “data centre” or equivalent? And have other parts of DDTS or DEFRA got any views here? Your suggestion about test data standards is very salient (e.g. by default we will use production data so our tests are realistic but, if so, we must protect it as if it were production data, including anonymising where that would be practical and effective) Pure security issues such as encryption are down to other standards, but a quick reference from here to those standards might be sensible. Ditto, I'm increasingly doubtful that "data formats" (e.g. such as the always vexed issue of address formats) are really for this doc but a reference to how to access such information (e.g. to standards elsewhere or at least which DEFRA team or function to talk to) should be included.

nigejohnson commented 4 years ago

... and of course backed up data should have at least the same protections as the equivalent live data, i.e. it would be worth stating something about security of back-ups too.

andrewhick commented 3 years ago

It's been a long time, but seeing as I'm leaving Defra soon, I thought I'd leave my work on a branch so someone else can pick it up!

Draft data standards page (on add-data-standards)

Thanks for all the valuable feedback so far.