Closed edwardchalstrey1 closed 1 year ago
Data Safe Haven is a cloud-deployable open source software that enables researchers to work with sensitive data in a secure computing environment.
Data Safe Haven is an open source piece of software for working with sensitive data in a secure computing environment hosted on the Microsoft Azure cloud. Anyone can deploy their own Safe Haven, and each institution that does so creates a new one for each research project. The development team at Turing also manages Safe Havens for various projects at the institute.
The Data Safe Haven (DSH) project involves the development and maintenance of an open source piece of software for working with sensitive data. This software provides a Secure Research Environment (SRE) for computational research and analysis of data too sensitive to be downloaded onto a personal computer. Anyone can deploy their own Safe Haven, and each institution that does so sets up a single Safe Haven Management environment (SHM) and a new SRE instance for each research project. Both of these components are created on the Microsoft Azure cloud computing platform, by following the Safe Haven documentation and running the deployment scripts. The DSH team at Turing also manage SRE deployments for various projects at the institute, as well as "Data Study Groups", which are hackathon-style challenges hosted at the institute where (typically) PhD students work with data from an external provider.
I didn't even think to mention MFA or governance methods, thanks GPT!
Hmm, ChatGPT's effort feels quite generic and some bits are either not really stressed in our publicly facing materials (e.g. advanced encryption) or incorrect (the detailed audit trail of all data access and sharing activities). What happens if you ask the same question about the James Joyce Institute's "Safe Haven"?
Data Safe Haven is a free and open-source, infrastructure-as-code defined environment for research using sensitive and personal data.
Data Safe Haven is an open, community driven project to enable research using sensitive and personal data. A major component of the project is a free and open-source, infrastructure-as-code defined trusted research environment. The project also outlines processes deigned to maintain security and address organisations information governance requirements.
The Data Safe Haven project aims to collaborate with others to produce an open-source, publicly-available, infrastructure-as-code trusted research environment (TRE) that can be used to reproducibly deploy configurable secure environments for productive research on sensitive data.
The Data Safe Haven project brings together users and providers of sensitive data in order to develop a productive trusted research environment (TRE) that is sufficiently secure and usable to meet both their needs. The main output is a publicly-available, open-source codebase that can be used to reproducibly deploy configurable TREs onto Microsoft's Azure cloud platform. Information governance concerns are also a key concern, so we have developed a series of security tiers that can be used to determine the appropriate security controls for a given project.
The Data Safe Haven project aims to remove barriers that prevent data analysis from going ahead on sensitive datasets. This means that we work closely with users and data providers to create trusted research environments (TRE) that find the right balance between security controls around data handling and the usability of the environments. Our major output is a publicly-available, open-source infrastructure-as-code repository that can be used to reproducibly deploy TREs onto Microsoft's Azure cloud platform. We have also developed an information governance classification that groups projects into sensitivity categories where the same security controls can be applied to projects in the same category. It is important that our work has national (and international) impact and as such we aim to be a key player in discussions around designing and using TREs, developing standards and ensuring that our work meets these standards.
Each team member to contribute their own suggestions separately, then we can pull them together