[Multi-Modal Image+Text] Provide 2d images from the UPMC dataset with report containing 14 disease term in CheXNet paper

batmanlab / BatmanLabWiki

Documents and Wiki of the lab

Apache License 2.0

3 stars 0 forks source link

[Multi-Modal Image+Text] Provide 2d images from the UPMC dataset with report containing 14 disease term in CheXNet paper #62

Closed sumedhasingla closed 6 years ago

sumedhasingla commented 6 years ago

[x] Convert the 3d volumetric images to 2d images compatible with CheXNet
[x] Identify the reports with the 14 disease terms. If the terms are present, verify if disease is affirmed or negated. Create label for images compatible with CheXNet.

sumedhasingla commented 6 years ago

Number of subjects with both image+text: 4456/7870 Number of subjects with both image+text and have positive disease label (at-least 1 in 14 disease: 3890/4456 Number of subjects with both image+text and have no disease label: 566/4456

sumedhasingla commented 6 years ago

The image files are saved at location: '/pghbio/dbmi/batmanlab/singla/Image_Text_Project/Data_Image_Text'

sumedhasingla commented 6 years ago

The excel file containing the label information: '/pghbio/dbmi/batmanlab/singla/Image_Text_Project/RAD-ALL-Findings-Impressions_ChestXLabels.csv'

columns:

Report_ID
Report: The text report with finding + impression section
Negation: Output of PyContext
NOBLETool: Output of NOBLE Tool. Its has the list of the phrase--context--semantic_type
Tags: List of the final tags that helped in finding the disease label for the 14 terms.
Finding Labels: The labels for CheXNet
Image_File_Name: Name of the image file for this report

sumedhasingla commented 6 years ago

@pyadolla Can you please verify the data and let me know if there are any issues.

sumedhasingla commented 6 years ago

The path to the raw images corresponding to the reports is saved in excel: '/pghbio/dbmi/batmanlab/Data/radiologyTextDataset2/singla/RAD-ALL-List-ExamPath.csv'

sumedhasingla commented 6 years ago

The source directory for images: '/pghbio/dbmi/batmanlab/Data/UPMC_Lung_Images/ftp.box.com

sumedhasingla commented 6 years ago

Git hub link: https://github.com/sumedhasingla/MultiModalImageText.git

kayhan-batmanghelich commented 6 years ago

Thanks @sumedhasingla for the update.