alan-turing-institute / data-classification-app

Classification webapp for the Turing Data Safe Haven
MIT License
4 stars 0 forks source link

Kick-off meeting with PI, stakeholders and dev team 2021-10-13 #337

Closed DavidBeavan closed 2 years ago

DavidBeavan commented 2 years ago

See collaborative notes hackmd

DavidBeavan commented 2 years ago

Thanks, everone, it was a bit intense and I'm sure we'll be back for clarification. Here's a snap-shot of the doc:

Data Safe Haven Classification App kick-off meeting with PI/stakeholders/dev team

tags: DSH DSHClassApp DataSafeHaven

Infodump

Unanswered questions (subsequently answered on Slack)

Thoughts/questions for PI / stakeholders on 2021-10-13?

MVP product backlog

(from hut23 issue) Classify as: Essential, Important, Nice to have

Meta-aim: Make the classification app easier to use by removing friction when: signing-up, logging-in, intuitively using the app & allowing multiple views. Ideally make it easier to do than to argue why not to.

  • Essential: authentication changes (decouple from the current Azure AD solution: eg. allow users from other Azure ADs to log in - rather than requiring @turingsafehaven.ac.uk accounts or self sign-up)
  • Important: a way to record datasets separately from projects so that we can see where the same dataset is used in multiple places (possibly even deploying an instance of https://github.com/amundsen-io/amundsen for tracking this and plugging into it?). Connected might be reports
  • Nice to have: add ability to export reports etc to show projects and datasets and their tiers. Turing DOES NOT currently have unique IDs of datsets.
  • Nice to have: clarification/update of the flowchart questions (with legal) to account for (a) what some of them actually mean and (b) how to apply them to data egress
  • Important: Remove the documented requirement for the PI to spin up a safe haven to look at the data before classifying. This is currently a big hold-up for Turing projects.
  • Nice to have: a way to mark classification as provisional - this could allow the data provider and PI to go through the classification process for a tier-0/tier-1 project without needing to spin up a safe haven for them to look at the data. Aside is is a yes/no/don't know possible? Where don't know is the most conservative answer, and the ned classification prompts the user which questions to better resolve for a fuller classification. If offered a "Dont know" then suggest questions the PI needs to be able to answer to be able to continue with the classification.
  • Important: better project views so that programme leads don’t have to see all projects but can restrict by eg. topic area
  • Important: better documentation/guided workflows built into the app rather than held as a separate PDF. Could be a sand-box or play one.
  • Nice to have: ways to bulk import/export/backup the data in the app
  • MVP would be to be able to create separate ingress or egress work packages, with associated questions.
  • Nice to have: Notification system - so the PM gets an auto email when the classification is completed, and then maybe the ability for the PM to send a certificate to IT to confirm tier levels. This can be used for both ingress - which SH would be required; and for egress - so that outputs can safely be removed.
  • Nice to have: Develop egress related workflow with output dataset post project Usecase: egress to challenge owner after a data study groups (still may need to be uin safe haven), but also with reports (not needed to be in DSH) Egress needs different wording Work to capture derived data and the proejct it associated with
  • Nice to have: Not require a data set to be used for every work package/ the option to add in something else e.g a report. Report (e.g. .PDF) and code (e.g. .git) to be uploadable, nameable and describeable through the app Nice to have: Not need us to add the details of the data sets – rathe the Cos can add them to the app directly – to null the need of a ‘data sets’ form.
  • Important The ability to edit work packages, users and so on, at the moment once its saved you can edit or remove (lots of confusing duplicates).
  • Important: Sort out multiple duplicated names Nice to have: version workflows, let a super admin make them, so classifications are kept against the Qs answered at the time

Important if UCL involved:

  • should we include additional questions that would allow us to score a project against other classification systems (eg. NHS or UK Government classifications)?
  • should we separate the questions from the business logic (eg. in a clean YAML format) in a way that means we can eg. generate a flow-chart from the questions