GSS-Cogs / sprint-planning

0 stars 0 forks source link

Data constraints #31

Open ajtucker opened 3 years ago

ajtucker commented 3 years ago

One issue with the web of data is that it is hard to ensure referential integrity and we often only see that links are broken or don't have any further data when we, or an end user, finally comes to look at the data on a page.

Part of the problem is that the schemas we use are quite open and different parts of the overall schema are filled out by different parties in different places.

We have been making use of SPARQL queries to discover where constraints are broken and these can help ensure that end users don't see broken links or pages full of URLs. We need to build up these queries to constrain more of our data.

We have also been trying to move the tests closer to the developer by encoding them as validity constraints on CSV with CSV-W. This can help make things more efficient and lead to a faster turn around.

We can go further and encode validations and editor hints into our libraries, e.g. using closed/final classes, annotations, or other schema languages (JSON Schema) to help developers at design time to get things right.

This epic is a placeholder for issues concerned with helping us constrain the data so that we get it right and make efficiencies by getting it right from the start.