dpc10ster / RJafroc

Artificial Intelligence: Evaluating AI, optimizing AI
19 stars 8 forks source link

T1-RRRC for ROC data #73

Closed jwitos closed 2 years ago

jwitos commented 2 years ago

Hi,

Does/should T1-RRRC analysis in StSignificanceTestingCadVsRad work for the ROC only data, i.e. without FROC data? Looking at docs and chapter 39. of the book it seems that it should, but ROC-only data with T1-RRRC gives an error: Error in t[i, j, , 1] : incorrect number of dimensions

This can be replicated e.g. by (1) loading the dataset09, (2) saving it as iMRMC -- ROC-only format, (3) loading said iMRMC file, (4) trying to do T1-RRRC significance test.

> s <- dataset09
> DfSaveDataFile(s, "sample.imrmc", format="iMRMC")
> s2 <- DfReadDataFile("sample.imrmc", format="iMRMC")
> StSignificanceTestingCadVsRad(s2, "Wilcoxon", method="1T-RRRC")
Error in t[i, j, , 1] : incorrect number of dimensions

T2-RRRC and T1-RRFC analyses work fine.

dpc10ster commented 2 years ago

dataset09 is already an ROC dataset.

The following works: x <- StSignificanceTestingCadVsRad(dataset09, "Wilcoxon",method="1T-RRRC")

The dataset09 object contains additional fields, necessary for CAD analysis, that are incompatible with the iMRMC format (these fields are lost on saving in iMRMC format).

In future I will remove the iMRMC format completely from the package. That will do away with the incorrect links

jwitos commented 2 years ago

Thank you. Yes, somehow it was the iMRMC format's fault. I converted study data to RJafroc's native format and 1T-RRRC analysis worked fine.

For future reference (if anyone finds this problem), below is a short python snippet that I used to convert iMRMC data to RJafroc format (simple factorial ROC dataset). I benchmarked it against dataset09 to confirm it's working fine.

import pandas as pd

f = open("dataset09.imrmc", "r")

# Convert iMRMC to Pandas DataFrame
df = pd.DataFrame(columns=['ReaderID','CaseID','ModalityID','Rating'])
vals = False
for l in f.readlines():
    if l.startswith("-1"):
        vals=True
    if vals:
        l = l.strip()
        reader_id, case_id, modality_id, rating = l.split(",")
        df = df.append({
            "ReaderID": int(reader_id),
            "CaseID": int(case_id),
            "ModalityID": modality_id,
            "Rating": float(rating)
        }, ignore_index=True)

# Create RJafroc Worksheet "TRUTH"
truth_df = df[df.ReaderID==-1][['CaseID', 'Rating']]
truth_df = truth_df.rename(columns={"Rating": "LesionID"})
truth_df['LesionID'] = truth_df['LesionID'].apply(int)
truth_df['Weight'] = 0

# Create RJafroc Worksheet "TP" (only diseased cases)
tp_df = df[(
    (df.CaseID.isin(truth_df[truth_df.LesionID==1].CaseID))
    & (df.ReaderID!=-1)
)]
tp_df = tp_df.rename(columns={"Rating": "LL_Rating"})
tp_df['LesionID'] = 1
tp_df['ModalityID'] = tp_df.ModalityID.apply(int)
tp_df = tp_df[['ReaderID', 'ModalityID', 'CaseID', 'LesionID', 'LL_Rating']]

# Create RJafroc Worksheet "FP" (only negative cases)
fp_df = df[(
    (df.CaseID.isin(truth_df[truth_df.LesionID==0].CaseID))
    & (df.ReaderID!=-1)
)]
fp_df = fp_df.rename(columns={"Rating": "NL_Rating"})
fp_df['ModalityID'] = fp_df.ModalityID.apply(int)
fp_df = fp_df[['ReaderID', 'ModalityID', 'CaseID', 'NL_Rating']]

# Write to excel file following RJafroc format
writer = pd.ExcelWriter('dataset09_converted.xlsx')
tp_df.to_excel(writer, sheet_name='TP', index=False)
fp_df.to_excel(writer, sheet_name='FP', index=False)
truth_df.to_excel(writer, sheet_name='TRUTH', index=False)
writer.save()
dpc10ster commented 2 years ago

Thank you! I am not familiar with Python but this should be useful to others. People have asked me about a Python version of RJafroc, to which I have had to say no.

dpc10ster commented 2 years ago

This issue has been fixed. See NEWS.md for details of fix.