airr-knowledge / issues

Issues and project management for the AKC
0 stars 0 forks source link

metrics and compliance dashboard #15

Open schristley opened 8 months ago

schristley commented 8 months ago

We will run automated queries regularly to monitor for new studies and/or data to be processed. Source data does not always adhere to standards, and metadata quality can be poor or missing in public archives. Thus, we will implement a dashboard that automatically reports compliance metrics generated during curation and validation (Aims 1.4, 2.4), including metadata completeness and accuracy, ontology and standards usage, and data formatting. Similar to the OBO Dashboard (http://dashboard.obofoundry.org/dashboard/index.html), the dashboard will flag issues for human review, provide reports and instructions for how to address the issues, and provide guidance for community engagement by pointing to common issues.

schristley commented 8 months ago

Hi @jamesaoverton , I'm assigning this to you initially as we mainly want to plan out how and what the dashboard does before we get into any implementation. In the proposal, we mostly describe this as related to curation, i.e. data from public archives is dirty thus we want to flag issues. However, we should also think of metrics and how this could apply to the various processes of the component repositories and the AK.

schristley commented 8 months ago

Here's one rough idea. The MiAIRR standard gives a list of elements. These are also tagged in the JSON schema with different requirement levels. MiAIRR allows many fields to be left null but it would be better to have data for them. A completeness metric could automatically measure for each Repertoire and/or Study in the ADC, the number of fields that are complete (versus being left null).

That's specifically for the MiAIRR standard. The AIRR Standards encompasses more so there could be an additional completeness metric for the larger standard too.