DIAGNijmegen / picai_eval

Evaluation of 3D detection and diagnosis performance —geared towards prostate cancer detection in MRI.
https://pi-cai.grand-challenge.org/
Apache License 2.0
20 stars 9 forks source link

Nan values in metrics outputs #5

Closed jakubMitura14 closed 2 years ago

jakubMitura14 commented 2 years ago

Hello I am invoking evaluate function and I get auroc and score metric as NaN I invoke function like

from report_guided_annotation import extract_lesion_candidates

y_hat=torch.sigmoid(y_hat) # y_hat is an output of the model
y_det=[extract_lesion_candidates( np.argmax(x.cpu().detach().numpy(),axis=0) )[0] for x in y_hat]
y_true=[np.argmax(x.cpu().detach().numpy(),axis=0) for x in labels]
#argmax to undo hot encoding
print( f"suums y_det {np.sum(y_det[0])} y_true  {np.sum(y_true[0])} len { len(y_det) } shapes  y_det {np.shape(y_det[0])} y_true  {np.shape(y_true[0])} ")

valid_metrics = evaluate(y_det=y_det,    y_true=y_true)

printed example output

suums y_det 77 y_true  1084 len 1 shapes  y_det (192, 192, 64) y_true  (192, 192, 64) 
No negative samples in y_true, false positive value should be meaningless
metrics.auroc nan metrics.AP -0.0  metrics.score nan  

as you see np.sum() return non zero values for both labels and algorithm output for both cases in a list Hence for two validation cases in each case algorithm output and gold standard,situation repeats for multiple cases just number changes but every time 1) y_hat and y have no Nan values 2) are non zero 3) have the same shape

Hence both information about no negative samples in y_true, and presence of nan values are highly mysterious for me

Additionally I get valid (approximately decreasing not Nan valued) loss function output

Thank you for help !

joeranbosma commented 2 years ago

Hi @jakubMitura14,

Most of the code is looking good, but the issue seems to be in the data. What you seem to provide is a single prediction and a single annotation (judging by the data shape of (192, 192, 64) and len of 1).

The evaluation pipeline provided by picai_eval is designed for the evaluation of 3D detection and diagnosis performance and should receive a dataset of predictions and corresponding annotations (rather than a single case). So while you have both positive and negative voxels in your prediction and annotation, there is only a single (positive) case.

You should provide multiple cases at once for picai_eval to work as intended (i.e., len(y_det) should be more than one, preferably at least 100 for these metrics to make sense, and 300+ for them to be really representative).

Hope this helps, Joeran

joeranbosma commented 2 years ago

I would like to add that np.argmax is not optimal for turning softmax predictions into detection maps. While it will work (that is: not give an error), you lose your model's nuance that is encoded in the per-voxel confidence. Assuming you have your channel dimension as the first (so, x.shape = (Channels, Height, Width, Depth) = (2, 192, 192, 64)), you can change this:

extract_lesion_candidates( np.argmax(x.cpu().detach().numpy(),axis=0) )[0]

into this:

extract_lesion_candidates( x.cpu().detach().numpy()[1] )[0]
jakubMitura14 commented 2 years ago

Thank you !! it seems to work now!

joeranbosma commented 2 years ago

Great!