Open caseyfitz opened 6 years ago
@caseyfitz did you carefully read the code of trained_model.calculate_volume
? This code treat centroids as connected components.
Ah, thanks @vessemer! I thought the functionality was clear to me but I must have been confused due to the fact that labels = [mask[centroid['x'], centroid['y'], centroid['z']] for centroid in centroids]
was returning [1 1 1 1 1 1]
on the six centroids
I was passing it (for LIDC-0003). Didn't realize that scipy.ndimage.label
has a default structure
parameter representing squared connectivity, which should be sufficient for this stage of the project.
The problem then, seems to be that the image has only one connected component, yes? If so, then 2 in the issue statement above should be good to go for now (in which case I'll edit the issue) and the immediate problems are just those in 1.
Make sense?
Yes, sure. I'll add some comments in trained_model.calculate_volume
with my next commit, since there is some obscurity :)
@caseyfitz Are you planning to merge the changes you did to the code base in your branch at some point to the master? And by the way: nice notebook! :)
After exploring the segmentation code under
prediction/src/algorithms/segment/
we have identified a few outstanding issues related to the segmentation functionality and volume calculations. These issues are all interrelated, but we've tried to divide them into two general catagories (whose code paths start insegment/trained_model.py
):Model architecture / complexity (
trained_model.predict
).npy
mask saved tosegment_path
- should not have 1024 slices. Most slices after 200 are uniform, for example inLIDC-IDRI-0003
with value0.45197698
and an overall range around-0.35
to0.8
simple_3d_model.py
andunet_3d_model.py
each use the samebest_model_Simple3DModel
and make identical predictions. However, the full unet will only process some full size test images without throwing aMemoryError
.Nodule volume calculation (
trained_model.calculate_volume
)numpy.bincount
, which calculates nodule volumes by summing non-zero values in the binary mask saved aslung-mask.npy
, does not use centroid information and merely sums non-zero values in the scan, yielding a (poor) total centroid volume rather than the distinct volumes of each centroid incentroids
. One negative impact of this is that forn
centroids, the predicted volume is just this total volume,n
times.scipy.spatial.ConvexHull
,skimage.morphology.convex_hull_image
) are either too memory intensive or only work with 2d arrays. Plus, it's not clear that a standard convex hull approach would be best anyway, since the entire lungs aren't our interest, but subsets of the lungs (perhaps something likeskimage.morphology.convex_hull_object
, but this only works on 2d arrays).trained_model.calculate_volume
) takes a list of centroids as inputs and calculates e.g. 3d connected components given those centroids.Simple3DModel
, masking of nodules does not perform well and it's possible that there is essentially one large connected component spanning ~200 slices.The approach to exploring these issues has been to use an interactive jupyter notebook, rooted in the
prediection
directory of the application. From there, one can usefrom src.algorithms.segment.trained_model import predict
to start playing with the outputs directly and testing changes on the fly. (Pro tip: use the magic%load_ext autoreload
to autoreload the functions with your changes everytime you call them.)And as always, please update documentation too with any new changes for easy points! (The segment predict docs are pretty weak right now.)