View Repo . Report Error . Request Feature . Request Document
This project aims to track general documentation, standard operating procedures (SOP) and helper scripts for XNAT.
Contact Dika for access to XNAT.
XNAT is an platform capable of storing and managing medical images and associated data. Within Guy’s and St Thomas’ NHS Foundation Trust (GSTT), it forms a part of the local secure enclave for the purpose of federated learning in artificial intelligence projects. The data is ingested from PACS into XNAT where it is anonymised and sorted into relevant projects, ensuring data is only visible to those who need it, and allowing for data deletion upon project completion. The following describes the process of data collection, anonymisation (or more accurately, de-identification) and data storage in XNAT, as well as how compliance with DICOM Standards Supplement 142 is achieved.
Medical imaging data is typically stored in a DICOM format. DICOM stands for Digital Imaging and Communications in Medicine and is an international standard format for medical image storage, retrieval, processing and transfer. DICOM images consist of the actual acquired image as a set of pixels and a DICOM header. Data coded within the DICOM header are a series of attributes describing the scan and patient. Each attribute is tagged with a unique DICOM tag which consists of a group and element number, and each tag has a name to identify the type of information (or attribute) contained within the tag. This principle of data tagging allows DICOMs to be compared, transferred, stored and queried.
Before any medical data can be used in research or for training of artificial intelligence (AI) algorithms, it must first be completely anonymised/de-identified such that no data used can be traced back to any individual. To do this, the DICOM tags need to be altered, deleted or manipulated in such a way that the image no longer describes the individual. However, because there are many DICOM tags within a DICOM header and since what is and what is not identifiable information is not always straightforward, a DICOM Standards Supplement 142 was created. This outlines best de-identification practices for purposes of clinical trials, and we have adopted this same standard for our de-identification approach.
De-identification in XNAT is done at 2 levels; firstly, when data arrives into XNAT from PACS (site-wide de-identification) and secondly, when data is moved from the Pre-Archive into the assigned project (project-level de-identification).
Data can be ingested into XNAT via two routes:
Teleradiology is used for sending individual DICOM objects to XNAT. The sender must have access to and the necessary permissions to send from Sectra PACS (managed by the PACS team) to the GSTT_XNAT destination.
Q/R is used for importing batches of data. The data required can be manually searched by accession number, patient name, patient ID or the date range. Alternatively, the data can be requested by uploading a CSV file. The instructions on how to format the CSV file can be found here.
Before you start make yourself a cup of tea or get a snack, put on some soothing music or a podcast in the background. XNAT is slow and you will need to be patient with it else you will keep cancelling your own commands. Pop-ups can be slow and the amount of data moved takes much time. I recommend you start this at the end of the day instead of at the beginning and leave it to go over night to account for higher bandwidth demands on PACS during office hours.
You can use accession number or patient ID, date range, *
for wildcard.
YYYYMMDD
to match the date formatting in PACS). After following either of the above two data retrieval methods, you will need to:
The following image represents the data flow from within an NHS trust to their research environment, including the steps taken to remove all identifiable information from medical images to ensure no personal information ever leaves the NHS trust.
The two methods of de-identification are outlined below.
Teleradiology is used to send individual DICOM objects from PACS to XNAT. As the data leave PACS, the following text is automatically appended to two DICOM tags, i.e. the Patient Comment Field (0010,4000) and Study Comment Field (0032,4000):
Project:Unassigned Subject:subj001 Session:subj001_sess001
This ensures that the patient name and accession number do not reach XNAT’s Pre-archive. It instead sets the subject ID to subj001
and the session ID as subj001_sess001
. This means that all data which is sent via teleradiology from PACS will arrive in XNAT with the same subject and session ID. These are then manually changed when moving the data from the Pre-archive to the Archive by the project owner. The data in XNAT can be distinguished based on timestamp of arrival matched to the timestamp in Sectra PACS so the project owner can identify which DICOM object belongs to which study subject.
Only individual scans should be sent via this route – for batches larger than 3 to 5 scans, please use Q/R.
When data is imported using Q/R functionalities of XNAT, new subject ID and session ID can be assigned to each scan imported. It's recommended a .csv is used for DQR upload, in which case the subject and session IDs can be specified in directly in the CSV file.
In both of the above methods, all other data in the DICOM headers is removed, changed or replaced as detailed in the de-identification scripts, such as:
The manipulation of the DICOM data for the purpose of de-identification follows the DICOM Standards Supplement 142, which specifies which tags should be removed, replaced, or manipulated to ensure traceability to individual from the data shared is not possible.
The project owner has the reading, writing, updating and deleting rights of all data they own. They can grant access to other users to either view-only or modify the existing data. Data can also be shared between projects as read-only.
The data will be stored on on-premises physical storage managed and controlled by GSTT.
See the open issues for a list of proposed features (and known issues).