Closed sumedhasingla closed 6 years ago
Number of subjects with both image+text: 4456/7870 Number of subjects with both image+text and have positive disease label (at-least 1 in 14 disease: 3890/4456 Number of subjects with both image+text and have no disease label: 566/4456
The image files are saved at location: '/pghbio/dbmi/batmanlab/singla/Image_Text_Project/Data_Image_Text'
The excel file containing the label information: '/pghbio/dbmi/batmanlab/singla/Image_Text_Project/RAD-ALL-Findings-Impressions_ChestXLabels.csv'
columns:
Report_ID
Report: The text report with finding + impression section
Negation: Output of PyContext
NOBLETool: Output of NOBLE Tool. Its has the list of the phrase--context--semantic_type
Tags: List of the final tags that helped in finding the disease label for the 14 terms.
Finding Labels: The labels for CheXNet
Image_File_Name: Name of the image file for this report
@pyadolla Can you please verify the data and let me know if there are any issues.
The path to the raw images corresponding to the reports is saved in excel: '/pghbio/dbmi/batmanlab/Data/radiologyTextDataset2/singla/RAD-ALL-List-ExamPath.csv'
The source directory for images: '/pghbio/dbmi/batmanlab/Data/UPMC_Lung_Images/ftp.box.com
Git hub link: https://github.com/sumedhasingla/MultiModalImageText.git
Thanks @sumedhasingla for the update.
[x] Convert the 3d volumetric images to 2d images compatible with CheXNet
[x] Identify the reports with the 14 disease terms. If the terms are present, verify if disease is affirmed or negated. Create label for images compatible with CheXNet.