booz-allen-hamilton / DSB3Tutorial

384 stars 197 forks source link

Image segmentation does not seem to work #6

Open apoyoman opened 7 years ago

apoyoman commented 7 years ago

Many of the masks produced by the tutorial seem to be garbage. I think part of the problem are the values of the images, which seem to be all over the board. Some of the images values actually seem inverted, high where the should be low (air in lungs) and low where they should be high (bone and tissue). What is going on here?

ghost commented 7 years ago

I did not notice that problem with any of the images that I checked. I'm not sure off the top of my head why this would be happening. Our method is a reasonable tactic from the point of view of a purely image processing perspective, but there is additional information in the imaging files that can be used to isolate the particular types of tissue.

You should check out the kernel : https://www.kaggle.com/gzuidhof/data-science-bowl-2017/full-preprocessing-tutorial

It explains how to segment the lungs based on the expected pixel value given the scattering length of xrays in the different types of tissues.

apoyoman commented 7 years ago

I was just following the tutorial step by step. Here is an example:

333145094436144085379032922488-0

apoyoman commented 7 years ago

Also here is a good mask:

826812708000318290301835871780-1

ghost commented 7 years ago

Do the techniques from the preprocessing kernel I linked to manage that particular scan better? I expect those will be better than the technique we're using here since it's using the information about the x-ray scattering length to predict pixel values for each tissue type.

If you really want to make the system we have here work, I would take another look at the scan in question. It looks like that image has been inverted, which has caused our preprocessing to highlight the most dense tissue within the lung rather than the least dense tissue.

apoyoman commented 7 years ago

Also the kernel you referenced is for DICOM files while the images in this tutorial are for mhd files and I dont know if the mhd files have the same information or follow the same standards as the DICOM standard.

apoyoman commented 7 years ago

So I am kind of at a loss on how to pre-process these mhd files.

apoyoman commented 7 years ago

Yes I mentioned that issue (many images seem to be inverted) in my first comment.

apoyoman commented 7 years ago

So, if half of the files are inverted, how did they get that way (are these files like that on the server)? And why wouldn't I not want to try to make this work?

ghost commented 7 years ago

I don't have any more work time allocated to put significant effort into improving the tutorial or spending much time looking into these issues, so I apologize for my curtness.

The inversion looks like a pain in the neck. There are probably some ad hoc ways you could test for it and invert the image back prior to processing.

The SimpleITK documentation describes how to check the metadata available in a file. Also there might be a forum associated with the LUNA2016 competition. If you look around and find that the tissue pixel ranges are present, then you could use the same methodology as in the kernel mentioned above.

The method we use here will be inferior supposing that the tissue pixel ranges are available. I blanked on the fact that that may not be available for the LUNA2016 when I posted the kernel link. If I were competing, I would check to see if that data is in the files prior to committing to the approach herein.

apoyoman commented 7 years ago

Ok, thanks, appreciate that much. Looks like a good thing would be to find the LUNA2016 forum.

ghost commented 7 years ago

And I have no idea why the LUNA2016 files would have inverted images. The ones I checked through I did not notice. I was mostly working with the files from the first two directories. The LUNA set is not maintained or provided by anyone connected to Kaggle or the DSB, so if there are issues with the dataset, I have no idea why that would be.

ghost commented 7 years ago

Good luck. I really do wish I had more time to continue to tidy these scripts up, but I really don't.

apoyoman commented 7 years ago

Not trying to bother you here, but I thought I should leave a fix here in case anybody else needs it:

In the script that produces the nodule masks, replace this line: imgs[i] = matrix2int16(img_array[i_z]) with: imgs[i] = normalizePlanes(img_array[i_z]) where: def normalizePlanes(npzarray): maxHU = 400. minHU = -1000. npzarray = (npzarray - minHU) / (maxHU - minHU) npzarray[npzarray>1] = 1. npzarray[npzarray<0] = 0. npzarray *= 255 return npzarray.astype(int)

then in the script that creates the overall masks, the following lines should be commented out: mean = np.mean(middle) max = np.max(img) min = np.min(img)

To improve threshold finding, I'm moving the

        # underflow and overflow on the pixel spectrum
        img[img==max]=mean
        img[img==min]=mean
innovator1108 commented 7 years ago

I have a query regarding the U-Net model given at the github. Is the unet.hdf5 model pretrained or we need to train it ? I'm asking this question as I'm not getting any segmentation results using the provided model. Please reply asap

abhiML commented 6 years ago

@apoyoman are there any other fixes or will doing https://github.com/booz-allen-hamilton/DSB3Tutorial/issues/6#issuecomment-279191379 be enough?

civilinformer commented 6 years ago

I am sure that there are other approaches to fix this, but this worked for me.

abhiML commented 6 years ago

@civilinformer thanks. And in how many epochs did you start getting good results? And did you use the full dataset?

abhiML commented 6 years ago

@apoyoman did you use the pretrained model or did you retrain it from scratch?