co-cddo / open-standards

Collaboration space for discussing and exploring technical and data standards
134 stars 18 forks source link

A metadata standard for regulatory documents #81

Open DidacFB-CDDO opened 2 months ago

DidacFB-CDDO commented 2 months ago

A metadata standard for regulatory documents

Note that this a refreshed February 2024 version of the original challenge request published in July 2021 which has now been closed.

Category

Challenge Owner

Kevin Xu Technical Architect | Smarter Regulation Executive | Department of Business and Trade (DBT) Amilta Stephen Boyd Policy Advisor | Smarter Regulation Executive | Department of Business and Trade (DBT) Catherine Tabone Data Scientist | The National Archives

Short Description

Short technical summary

Metadata ORDS was developed by reviewing the following metadata standards: Dublin Core - international metadata standard for describing digital or physical resources. Functional Requirements for Bibliographic Records (FRBR) - Entity relationship model developed by librarians. Data Catalog Vocabulary (DCAT) - Metadata standard recommended by CDDO for data interchange within government. European Legislation Identifier (ELI) - EU standard for identifiers and metadata for European legislation publishers to describe legal documents online.

We have chosen to base ORDS on selected values from the Dublin Core metadata standard with a few additional regulation specific properties. Dublin Core is an established and widely used standard. It aligns with best practice for UK government data publishing and provides properties to express most to the relevant values for regulatory documents. We expect ORDS to act as a specification for how Dublin Core properties should be applied to regulator data in a consistent and logical way. SRD Data Standards framework The larger aim of SRD Digital is to improve the machine readability of regulatory content. We have defined a framework / vision for how this would be done. The different levels represent our recognition that regulators (and therefore regulatory content) are at different stages in their digital transformation journey and have different levels resource to commit to content publishing. We want to encourage best practice and help regulators take steps in this journey, no matter where they might be.

Consistency ORDS metadata standard Open Document File Format Meaning XML / HTML structure XML / HTML semantic mark-up Rules as Code – when appropriate

With most regulators, our goal is to achieve consistency of metadata – i.e the adoption of ORDS and for regulators to use open document file formats.

User Needs

Strategic alignment: There is a cross government drive to improve publishing processes, enabling data discoverability and to implement a common approach to recording metadata. ORDS supports these initiatives in the regulatory publishing space, benefitting 80+ regulators to ensure data interoperability with one another. Data providers: Regulators need to publish their regulatory documents in a way that is easily found and accessible. They need to be able to easily manage their documents, be able to see how their documents relate to others and ensure they are continuously updated. To provide such maintenance and analytical options, there needs to be consistent metadata. Intermediary data consumers: Better structured data and improved discoverability enables the creation of services and software dealing with regulatory information (RegTech). This includes: Software developers Data Scrapers Insight generators Legal and regulatory advisors Regulatory consumers: Better services and software makes it easier to identify and comply with regulation. This supports: Individuals Businesses

Functional Needs

Process

Our first step was to engage with regulators and RegTech companies to validate the user need for a data standard for regulation documents. Our second step was to consider pre-existing standards to determine whether they could be directly used in the regulatory context. We looked at Dublin Core, DCAT, Akoma Ntoso, and Crown XML. These standards are highly aligned with each other and included a large majority of the fields which would seem appropriate in the regulatory context. However, none were suitable to use exactly in their current form. Our third step was to create a version 0.1 of ORDS, applying pre-existing standards to the regulatory context. For the metadata fields, this involved cross-referencing between DCAT and Dublin Core to come up with a consolidated set of fields (some mandatory, some optional), with a small number of additional fields specific to the regulation context. We also developed an initial proposal for content mark up.
Our fourth step was to form a working group comprised of representatives from DBT, the regulators, the National Archives and the Data Standards Authority to review and provide feedback on ORDS v0.2. We have now presented the to Data Standards Authority Steering Board and Peer Review Group, which recognises the value of ORDS.

Questions

MattiSG commented 1 month ago

Thanks for sharing this in the open.

As explained three years ago in https://github.com/co-cddo/open-standards/issues/79#issuecomment-2092514723, I believe the ability to unambiguously reference regulatory documents is critical for interconnection, before even discoverability through metadata. I read in this challenge description points about metadata, but not about reference, even though ELI is mentioned as part of the reviewed points. The establishment of an URI scheme, preferably based on existing standards, is from the perspective of tools such as @OpenFisca a priority for modeling.

Regarding both referencing and “content markup”, you could be interested in @verbman’s Parliamentary Love Letter format, which holds strong promises of simplicity, efficiency and reusability.