ShahinSHH / COVID-CT-MD

A COVID-19 CT Scan Dataset Applicable in Machine Learning and Deep Learning
27 stars 4 forks source link
covid-19 covid-19-data covid-dataset covid-detection ct-scans dicom dicom-files

COVID-CT-MD

A COVID-19 CT Scan Dataset Applicable in Machine Learning and Deep Learning

The COVID-CT-MD dataset contains volumetric chest CT scans (DICOM files) of 169 patients positive for COVID-19 infection, 60 patients with CAP (Community Acquired Pneumonia), and 76 normal patients. Diagnosis of COVID-19 infection is based on positive real-time Reverse Transcription Polymerase Chain Reaction (rRT-PCR) test results, clinical parameters, and CT scan manifestations identified by three experienced thoracic radiologists. Diagnosis for CAP and normal cases was confirmed using clinical laboratory tests, and CT scans. A subset of 54 COVID-19, and 25 CAP cases were analyzed by the radiologists to identify and label slices with evidence of infection. The labeled subset of the data contains 4,957 number of slices demonstrating infection and 18,392 number of slices without the evidence of infection.
We're working closely with our collaborators in medical centers to provide more number of CT scans to introduce a larger Multi-Centre COVID-19 dataset to be used for a more extensive area of research. This dataset will be available for the public use in the near future.

Links

COVID-CT-MD dataset is accessible through Figshare. To access the associated clinical data and the labels from all three radiologists you can refer to the above link.

The detailed desription of the dataset is available at https://www.nature.com/articles/s41597-021-00900-3

UPDATE 1 (Sep 8, 2021)

After further review of two cases (P001 and P006), our team has decided to update the labels associated with them. Updated labels can be accessed through the following files:

While the updated files contain more accurate lobe-level and slice-level labels for two cases, DL models developed based on the original version of the labels (Slice-level-labels, Lobe-level-labels) and those developed based on the updated ones don't show a significant difference as the changes are minor.

Suplementary Information

Table Cases Sex Age(year)
COVID-19 169 108 M/61 F 51.96 ± 14.39
CAP 60 35 M/25 F 57.7 ± 21.7
Normal 76 40 M/36 F 43.4 ± 14.1

Data Structure and Sample

A small sample of the dataset is available in the "Sample data" folder including DICOM files of two patients in each category to provide a quick insight into the dataset.

The hierarchical list below shows the structure of the COVID-CT-MD dataset shared through Figshare . COVID-19, CAP and Normal subjects are placed in separate folders, within which patients are arranged in folders, followed by CT scan slices in DICOM format.

NOTE: The correct order of slices in a CT scan doesn't necessarily follow the order of the Slice-IDs. You need to sort slices based on the "slice location" parameter provided in the DICOM files when you are reading the data. The “Slice Location” value is stored in DICOM files and is accessible through the following DICOM tag:
(0020,1041) - DS - Slice Location

Labels

IMPORTANT:

While reading DICOM files, note that the correct order of slices in a CT scan doesn’t necessarily follow the order of the Slice-IDs. It’s recommended to use the slice location value to sort the slices. Otherwise, the labels will not match correctly to the images. The “Slice Location” value is stored in DICOM files and is accessible through the following DICOM tag:
(0020,1041) - DS - Slice Location ## Statistical Analysis "statistical_analysis.py" is the code to re-produce the statistical analysis provided in the data description.
Please note that your Python directory should be set to the folder where you store the downloaded pacakge.

Requirements:

* pydicom (Installation) * pandas * seaborn * tempfile * os * numpy * matplotlib ## Citation If you found this dataset and the related data descritipon useful in your research, please consider citing: ``` @article{Afshar2021, author = {Afshar, Parnian and Heidarian, Shahin and Enshaei, Nastaran and Naderkhani, Farnoosh and Rafiee, Moezedin Javad and Oikonomou, Anastasia and Fard, Faranak Babaki and Samimi, Kaveh and Plataniotis, Konstantinos N and Mohammadi, Arash}, doi = {10.1038/s41597-021-00900-3}, issn = {2052-4463}, journal = {Scientific Data}, number = {1}, pages = {121}, title = {{COVID-CT-MD, COVID-19 computed tomography scan dataset applicable in machine learning and deep learning}}, url = {https://doi.org/10.1038/s41597-021-00900-3}, volume = {8}, year = {2021} } ```