Contains the explanation, schemas and example usage of ORPML (TO BE TRANSFERED TO DBaT WHEN READY)
ORPML stands for Open Regulation Platform Markup Language. It describes regulatory content that is resident in the Open Regulation Platform (ORP). The ORP is owned by the Better Regulation Executive (BRE) - a division within the Department for Business and Trade (DBaT).
Data passed to the ORP (for ingestion) and search results for queries on the ORP, will both use these schemas for accepting and formatting data.
The design goals for these schemas are:
ORPML describes documents that reside within the ORP. Specifically, it describes document level metadata and also form the markup of the document content. Regulators that contribute content to the ORP will provide their content in ORPML and send it to the ORP via an API. Users of the ORP (RegTech companies, citizens..) can search the ORP and receive results where each returned document is formatted in ORPML.
The DBaT github organisation will contain the ORPML repo which will contain the design decisions of the standard and each release of the standard in XML schema. The repository is not currently live but it is envisaged that it will be mid 2023.
Currently, the standard is at v0.1. Major changes to both the nature and content of the standard should be expected. Initial consultation with both regulators and RegTech companies will use v0.1 as a discussion point.
Two major organisations and initiatives have been the initial starting points for consideration of ORPML:
The National Archives
The CDDO
When deciding on the nature of ORPML, the first set of standards considered was Crown Legislation Markup Language (CLML) from The National Archives. The National Archives are the preeminent organisation in the UK when it comes to legislation and has the most experience in providing government owned data in a structured way to a variety of audiences. CLML describes both content level metadata and the content itself so this seems like an obvious candidate for the ORP project. The initial question asked was “Should ORP just adopt CLML as its markup language?”. Evaluation of CLML enabled a decision to not adopt CLML wholesale as the markup of the ORP, but use the majority of its component parts and ideas. Where competing standards are encountered, the deciding factor has been to adopt the standard closest to CLML as regulatory material is closest in format, intent and audience to that of legislative material. Specifically, the following parts of the way The National Archives uses markup and publishes material, have been adopted by the ORP:
ORPML differs from CLML in its usage of DublinCore in 2 important ways: 1. ORPML accepts the full range of DublinCore terms elements, not just the core elements - however only the core elements are mandatory 2. ORPML actually only references the ‘terms’ schema (of which the core elements are now a subset)
The following parts of CLML have not been adopted into ORPML: The use of CLML markup for describing the content of material. It was determined that CML markup was too specific to legal material (it is based on legal Markup language Akono Ntoso). The nature of regulatory material is very varied and can range from guides and practice notes that are similar in nature to simple ‘how to’ guides, to standards and rules that are highly structured in nature. CLML is focused on redaction, amendment and enforcement of material and has specific abilities to deal with these important parts of legislation. Regulatory material is not so concerned with amendments below the document level, i.e. the section, sub-section or clause level. In ORPML’s choice of content markup, we learned from some of the issues that The National Archives have faced with Akono Ntoso. The National Archives provide all material in Akono Ntoso, but in a more widespread standard e.g. the use of a HTML5 representation of Akono Ntoso.
As part of the work to create the Government Data Catalogue and Data Marketplace, the Central Digital and Data Office (CDDO) (which is part of The Cabinet Office), has created DCAT schema for data interchange within government. It is desirable to be able to share all regulatory material as a discoverable and navigable dataset across government. Work has been done to compare DCAT and DublinCore (DCAT is itself based on DublinCore). Many of the mandatory fields in DCAT are direct ports from the mandatory fields in DublinCore. We evaluated the entirety of both DublinCore terms and DCAT and proposed that it is possible and desirable to support both initiatives with a common set of metadata. It is proposed that the entirety of DCAT be supported.
Akono Ntoso - this was rejected for the reasons stated above. In addition, the creation of Akono Ntoso is typically done using legal publishing tools and it was considered burdensome to regulators to make them use these tools for more loosely structured content. Schema.org - ORP would have to create a schema for its material. Whilst there is a guide already, this would not meet the needs of all regulatory material - specifically more formally structured content such as specifications and rules.
Document level metadata in ORPML is consisted of: Mandatory elements: DublinCore ‘core’ elements DCAT mandatory elements ORP specific mandatory elements
This list is in preference order so if a metadata field is mandatory in DublinCode, but optional in DCAT, then the field will be mandatory in ORPML. However, if the field is unique to DCAT and mandatory, then it will be mandatory in ORPML. Where an element occurs in both DublinCore and DCAT, the the DublinCore element will be used.
DCAT element names contain punctuation and whitespace. This was considered to not be disearable since all other elements will be in camelCase. Therefore the names of all DCAT elements have been converted into camelCase, e.g. Time Period Coverage - Start Date
is now timePeriodCoverageStartDate
Optional elements: DublinCore terms optional elements DCAT optional elements and DCAT ‘Mandatory if Applicable’ elements
In deciding upon the elements from DCAT for inclusion into ORPML, the following documents were used as references: Record information about data sets you share with others - GOV.UK The CDDO’s document “CROSS-GOVERNMENT DATA SHARING Metadata Requirements Specification”
Specifically, the gov.uk guide to record sharing asks you to consider the following fields:
creator
- present in DublinCore core so used in ORPML
dateCreated
- present in DublinC0re as created
so used in ORPML
name
- not used in ORPML as title
is used instead
description
- present in DublinCore core so used in ORPML
identifier
- present in DublinCore core so used in ORPML
encoding format
- present in DublinCore core as format
so used in ORPML
supersededBy
- not used as DublinCore has isReplacedBy
supersedes
- not used as DublinCore has replaces
expires
- present in DCAT as timePeriodCoverage-EndDate
so used in ORPML
temporal coverage
- not used as this is ‘data collection date’ rather than enforcement date
conforms to
- not used but would be hardcoded to ORPML
license
- present in DublinCore so used in ORPML
hasDigitalDocumentPermission
- covered by a combination of dc:license
and dcat:securityClassification
These elements are specific to ORPML content and not part of external schemas. We have tried to minimise the number of these elements as much as possible.
It is proposed that all content provided in ORPML be HTML. The benefits of using HTML as the content markup are severalfold: Editorial tools can be used to convert from existing publishing formats such as .pdf or docx into HTML. There is an established industry that Regulators could use to convert from their existing publish formats into HTML ORP can validate content to make sure it is valid HTML as HTML validators are readily available in Open Source Software Rendering HTML in the ORP portal or by RegTech companies, is simple and straightforward
One decision point that is still to be made is “Should we create a superset of HTML with Regulatory specific elements or use element attributes to describe regulatory concepts?” Our working assumption is that we will only accept standard HTML and that regulatory concepts will be described by element attributes. Here are some examples to describe how this could look in ORPML:
Sections
<article aria-label="my article name">
<a href="https://github.com/UKGovernmentBEIS/orpml/blob/main/this/is/the/link/to/the/article" aria-describedby="article link information">
<h1>Article name</h1>
</a>
<section aria-label="important section">
<a href="https://github.com/UKGovernmentBEIS/orpml/blob/main/this/is/the/link/to/the/section"/>
This text is really important and it deserves to be in its own section.</br><em>All</em> standard HTML markup will be supported
</section>
</article>
Sub Sections
<section data-orp-type="subSection" aria-label="sub section name">
<a href="https://github.com/UKGovernmentBEIS/orpml/blob/main/this/is/the/link/to/the/subSection" aria-describedby="article link information"/>
This text is not as important so it gets to be in a sub section.</br><em>All</em> standard HTML markup will be supported
</section>
Clause
<section data-orp-type="clause" aria-label="clause name">
<a href="https://github.com/UKGovernmentBEIS/orpml/blob/main/this/is/the/link/to/the/clause" aria-describedby="clause link information"/>
This text is really important and it deserves to be in its own section.</br><em>All</em> standard HTML markup will be supported
</section>
Entities
<section data-orp-type="clause" aria-label="section name">
<a href="https://github.com/UKGovernmentBEIS/orpml/blob/main/this/is/the/link/to/the/clause" aria-describedby="section link information">
We will be talking about new regulations from <div data-orp-entity="The Bank of England" aria-label="entity">The Bank of England</div>.
</section>
Citations
<section data-orp-type="clause" aria-label="section name">
<a href="https://github.com/UKGovernmentBEIS/orpml/blob/main/this/is/the/link/to/the/clause" aria-describedby="section link information"/>
This regulation derives from the <cite><a href=-"https://www.legislation.gov.uk/ukpga/2018/12/contents">Data Protection Act 2018</a></cite>
</section>