AI-SDC / ACRO

Tools for the Automatic Checking of Research Outputs. These are the tools for researchers to use as drop-in replacements for commands that produce outputs in Stata Python and R
MIT License
15 stars 2 forks source link

Simple GUI tool to let people create an acro session and add outputs without directly using acro code #218

Open jim-smith opened 4 months ago

jim-smith commented 4 months ago

This thread/issue is about getting involvement from the community in specifying what the capabilities of such a tool should be, so that we can estimate the resources it would require to create.

There are two immediate options:

  1. Develop from scratch
  2. Clone the adapt the view tool used by output checkers

Hopefully specifying capabilities will let us find out which is the better option

List of Intended Benefits

  1. Encourage more researchers to present outputs in format that output checkers can open in SACRO viewer
    • consistency of practice,
    • keeping all the communications in one place for output checker has what they need easily to hand
    • outputs can be wrapped-up for export in TRE-preferred format (e.g. via RO-CRATES, ...)
      1. Enable researchers to access help documentation about risks and mitigations
      2. Make it easier for researchers to open and amend acro sessions produced when they used SACRO
      3. Work towards supporting checking for secondary disclosure (differencing)
jim-smith commented 2 months ago

List of capabilities

(please add via comments below so they can be edited them into this list)

1: Should allow a user who has done their analysis without sacro to:

  1. Open a GUI that automatically starts a new acro session
  2. Use a drag-and-drop interface to add to the session (via a hidden call to acro.add_unsuported_output):
    • Their desired output files (tables, regressions, plots, ...)
    • The code they used to produce the outputs (for the sake of open science)
    • Any other output files (such as draft papers)
    • The risk appetite file they were given (if dataset-specific)
  3. View the list of files in the session
  4. Delete files (if they have added the wrong ones)
  5. Click on a file to load it into the sub-window to
    • preview their contents (to make sure they have added the right ones)
    • add comments/explanantions to output checkers
    • add requests for exceptions to be made (e.g., "these are structural zeros"...)
    • rename the file to have more useful names
    • possible add a description of the output type from a drop-down menu
    • with possible links through to embedded copies of the stats barns descriptors and/or SDAP manual contents
    • to let them understand and self-evaluate disclosure risks
    • NB We would need to flag to the output checkers that these type labels are not as reliable as when acro creates them directly.
  6. Edit a list of what features/variables/attributes are included in which outputs
    ( may help process of checking for secondary disclosure)
  7. Press a single button that:
    • prompts them for a name for the session,
    • optionally gets answers to a set of generic and/or TRE-specified questions to assist the TRE in triaging output requests
    • then wraps everything up in a folder to send to the output checkers

Most of the above can be achieved via calls to existing acro functionality except the optional part of 1.6

2. Should allow a researcher who has used SACRO for their analysis to:

  1. Click on a folder created by a call to acro.finalise() and open it in the viewer
  2. Preview their files and associated disclosure risk (similar to current output viewer tool)
  3. Add files via drag-and-drop
  4. Edit names, comments/ and exception requests, for new and existing outputs
  5. Edit the 'output description/type' for:
    • new outputs
    • other output previously added as 'unsupported types'
    • but not those that have been risk assessed vi acro code
    • NB We would need to flag to the output checkers that these type labels are not as reliable as when acro creates them directly.
  6. Delete files from a session
  7. Follow links to view disclosure risks and mitigation strategies associated with different types of outputs
  8. Edit a list of what features/variables/attributes are included in which outputs
    • may help process of checking for secondary disclosure
  9. Press a single button that writes the amended session back to the filesystem
    • Question should this be a new folder with a name based on the original?
    • Question should the original folder be emptied to prevent confusion? or zipped with a suitable name?

3 Other use cases?

  1. researcher has viewer open simultaneously with their python/R/Stata analysis using acro
    • would require a lot of extra code synchronising the ``live'' session with what wa in the viewer
    • might be hard to make this as robust
    • but would facilitate access to html-based help pages on understanding and mitigating risk
    • would possibly be better for viewing lengthy outputs /disclosure risk
    • would definitely be more user friendly for reviewing which outputs to keep/delete than the current json dump provided by acro.print_outputs()