Open apoyoman opened 7 years ago
I did not notice that problem with any of the images that I checked. I'm not sure off the top of my head why this would be happening. Our method is a reasonable tactic from the point of view of a purely image processing perspective, but there is additional information in the imaging files that can be used to isolate the particular types of tissue.
You should check out the kernel : https://www.kaggle.com/gzuidhof/data-science-bowl-2017/full-preprocessing-tutorial
It explains how to segment the lungs based on the expected pixel value given the scattering length of xrays in the different types of tissues.
I was just following the tutorial step by step. Here is an example:
Also here is a good mask:
Do the techniques from the preprocessing kernel I linked to manage that particular scan better? I expect those will be better than the technique we're using here since it's using the information about the x-ray scattering length to predict pixel values for each tissue type.
If you really want to make the system we have here work, I would take another look at the scan in question. It looks like that image has been inverted, which has caused our preprocessing to highlight the most dense tissue within the lung rather than the least dense tissue.
Also the kernel you referenced is for DICOM files while the images in this tutorial are for mhd files and I dont know if the mhd files have the same information or follow the same standards as the DICOM standard.
So I am kind of at a loss on how to pre-process these mhd files.
Yes I mentioned that issue (many images seem to be inverted) in my first comment.
So, if half of the files are inverted, how did they get that way (are these files like that on the server)? And why wouldn't I not want to try to make this work?
I don't have any more work time allocated to put significant effort into improving the tutorial or spending much time looking into these issues, so I apologize for my curtness.
The inversion looks like a pain in the neck. There are probably some ad hoc ways you could test for it and invert the image back prior to processing.
The SimpleITK documentation describes how to check the metadata available in a file. Also there might be a forum associated with the LUNA2016 competition. If you look around and find that the tissue pixel ranges are present, then you could use the same methodology as in the kernel mentioned above.
The method we use here will be inferior supposing that the tissue pixel ranges are available. I blanked on the fact that that may not be available for the LUNA2016 when I posted the kernel link. If I were competing, I would check to see if that data is in the files prior to committing to the approach herein.
Ok, thanks, appreciate that much. Looks like a good thing would be to find the LUNA2016 forum.
And I have no idea why the LUNA2016 files would have inverted images. The ones I checked through I did not notice. I was mostly working with the files from the first two directories. The LUNA set is not maintained or provided by anyone connected to Kaggle or the DSB, so if there are issues with the dataset, I have no idea why that would be.
Good luck. I really do wish I had more time to continue to tidy these scripts up, but I really don't.
Not trying to bother you here, but I thought I should leave a fix here in case anybody else needs it:
In the script that produces the nodule masks, replace this line: imgs[i] = matrix2int16(img_array[i_z]) with: imgs[i] = normalizePlanes(img_array[i_z]) where: def normalizePlanes(npzarray): maxHU = 400. minHU = -1000. npzarray = (npzarray - minHU) / (maxHU - minHU) npzarray[npzarray>1] = 1. npzarray[npzarray<0] = 0. npzarray *= 255 return npzarray.astype(int)
then in the script that creates the overall masks, the following lines should be commented out: mean = np.mean(middle) max = np.max(img) min = np.min(img)
# underflow and overflow on the pixel spectrum
img[img==max]=mean
img[img==min]=mean
I have a query regarding the U-Net model given at the github. Is the unet.hdf5 model pretrained or we need to train it ? I'm asking this question as I'm not getting any segmentation results using the provided model. Please reply asap
@apoyoman are there any other fixes or will doing https://github.com/booz-allen-hamilton/DSB3Tutorial/issues/6#issuecomment-279191379 be enough?
I am sure that there are other approaches to fix this, but this worked for me.
@civilinformer thanks. And in how many epochs did you start getting good results? And did you use the full dataset?
@apoyoman did you use the pretrained model or did you retrain it from scratch?
Many of the masks produced by the tutorial seem to be garbage. I think part of the problem are the values of the images, which seem to be all over the board. Some of the images values actually seem inverted, high where the should be low (air in lungs) and low where they should be high (bone and tissue). What is going on here?