faustomilletari / VNet

GNU General Public License v3.0
284 stars 123 forks source link

Second block should have 2 convolutions #9

Open prhbrt opened 7 years ago

prhbrt commented 7 years ago

I noticed that your second block in the diagram in your paper

v-net

source: https://arxiv.org/abs/1606.04797

has 2 convolutional layers, yet here is has one:

https://github.com/faustomilletari/VNet/blob/master/Prototxt/train_noPooling_ResNet_cinque.prototxt#L143

Shouldn't there be 2? And shouldn't the thrid block have 3?

https://github.com/faustomilletari/VNet/blob/master/Prototxt/train_noPooling_ResNet_cinque.prototxt#L230

faustomilletari commented 7 years ago

looking into it...

mattmacy commented 7 years ago

I'm translating this to pytorch and have noticed the same thing only more so. It might help if you put each layer on a single line like: http://lmb.informatik.uni-freiburg.de/resources/opensource/3dUnet_miccai2016_with_BN.prototxt Prototxt is horrible no matter how it's formatted, but it's much easier to see what's going on that way.

I've lumped together the up/down convolution and the subsequent convolution(s) in to a single logical layer. The first argument to Down/Up is the number of input channels and the second argument is the number of convolutions. The following is how the network is actually implemented in ResNet_cinque

class VNet(nn.Module): def init(self): super(VNet, self).init() self.in_tr = InputTransition(16) self.down_tr32 = DownTransition(16, 1) self.down_tr64 = DownTransition(32, 2) self.down_tr128 = DownTransition(64, 3) self.down_tr256 = DownTransition(128, 3) self.up_tr256 = UpTransition(256, 3) self.up_tr128 = UpTransition(128, 2) self.up_tr64 = UpTransition(64, 1) self.up_tr32 = UpTransition(32, 1) self.out_tr = OutputTransition(16)

This is what it would look like if it were implemented per the diagram:

class VNet(nn.Module): def init(self): super(VNet, self).init() self.in_tr = InputTransition(16) self.down_tr32 = DownTransition(16, 2) self.down_tr64 = DownTransition(32, 3) self.down_tr128 = DownTransition(64, 3) self.down_tr256 = DownTransition(128, 3) self.up_tr256 = UpTransition(256, 3) self.up_tr128 = UpTransition(128, 3) self.up_tr64 = UpTransition(64, 2) self.up_tr32 = UpTransition(32, 1) self.out_tr = OutputTransition(16)

I haven't ported the Dice loss function or the Vnet equivalent to Unet3D's value deformation layer and I only have a single Pascal Titan X so I can't say how much of a difference it all makes in practice. I don't have any spare time at the moment, but next month I may compare them both on the LUNA16 data set.

prhbrt commented 7 years ago

Correct, I also saw several similar inconsistencies, just wanted to start with one :) I have written code in Keras btw.

mattmacy commented 7 years ago

@prinsherbert have you extracted the data augmentation code and dice loss function from his Caffe fork? I started extracting the former from the 3D Unet caffe patch and it's rather painful.

Also have you tested the results of what's in the diagram versus what's in the prototxt?

gattia commented 7 years ago

@prinsherbert, similar to the comment by @mattmacy about the data augmentation, have you written your own Keras code for that? I've been playing with a keras implementation but haven't tried data augmentation at the moment because I haven't had the time to write my own image manipulations and keras only has augmentations for 2D images out of the box.

mattmacy commented 7 years ago

@prinsherbert another irregularity that I'm not completely comfortable with (but is actually not a discrepancy between the prototxt and the paper) is that in the sixth block you're actually reducing the number of channels by 4x, going from 256->64. Once again, as implemented it may well be for the best, but for symmetry's sake I wonder if there should be an additional up convolution layer before we start adding skip connections.

prhbrt commented 7 years ago

I played with some cancerimagearchive data, but not with augmentation. I would suggest creating three Gaussian random fields (blurred 3D Gaussian noise with some variance). Then you set each Augmented[i,j,k] to Original[ grf_x[i], grf_y[j], grf_z[k] ]. Since the indices in Original are floating point, you'd need some interpolation. Not sure if @faustomilletari 's paper does this, but is should be quick and easy to implement.

Unfortunately I am waiting for my CT-scans, there should be 1000's of them, and decided to put a pin in it until I have it. Not sure if i'd be using augmentation, probably I will.

prhbrt commented 7 years ago

@mattmacy I found a whole bunch of things that at least quite puzzled me, were inconsistent or plain wrong. I have no idea about the details, but I also had some trouble getting the downsampling and reported number of channels in the paper and prototext to be consistent or sensible. I think decided to report some issues here and allow @faustomilletari to have a look at it.

@faustomilletari No offence by the way, I am trying to be constructive, your work seems relevant nevertheless. One question I wanted to answer is whether the system has significant better performance than "average probability of all training ground truths" and stuff like that, since prostates will typically be at the same location across humans up until some level.

faustomilletari commented 7 years ago

I appreciate your efforts.

Please keep reporting issues and mistakes in this model or in the code that runs it. Especially if there are mistakes that can impair the performance!

If possible I would appreciate to have some pull requests with corrections appear here sooner or later :)

In any case, the model here should be consistent with the one used to evaluate the paper.

Regards,

Fausto Milletarì Sent from my iPhone

On 8 Mar 2017, at 14:57, Herbert notifications@github.com wrote:

@mattmacy I found a whole bunch of things that at least quite puzzled me, were inconsistent or plain wrong. I have no idea about the details, but I also had some trouble getting the downsampling and reported number of channels in the paper and prototext to be consistent or sensible. I think decided to report some issues here and allow @faustomilletari to have a look at it.

@faustomilletari No offence by the way, I am trying to be constructive, your work seems relevant nevertheless. One question I wanted to answer is whether the system has significant better performance than "average probability of all training ground truths" and stuff like that, since prostates will typically be at the same location across humans up until some level.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

prhbrt commented 7 years ago

So a good next step, I assume, would be to move the prototext to a new schematic image to have a good idea of what is happening, and allow easy comparison with the paper's schema?

This would only work if several people review the new schematic image to avoid the same problems.

faustomilletari commented 7 years ago

Indeed. I agree. Unfortunately in this moment I have not much time on my hands (being not working for my PhD anymore)

But I will try to do that soon.

Another option could be to port everything to tensorflow which would add much value to this project.

Regards,

Fausto Milletarì Sent from my iPhone

On 8 Mar 2017, at 15:06, Herbert notifications@github.com wrote:

So a good next step, I assume, would be to move the prototext to a new schematic image to have a good idea of what is happening, and allow easy comparison with the paper's schema?

This would only work if several people review the new schematic image to avoid the same problems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

prhbrt commented 7 years ago

I would think porting to tensorflow is equally complicated as the other ports (keras (which can be tensorflow) and pytorch) if it is unclear what the paper's network was. I don't think it's another option, I think it's an extra nice to have. But that is also because I am not familiar with caffe and prototext.

I am talking a bit from polite frustration because I was going back and forth with network architectures trying to make it work according to the prototext and the schema. (In the hopes to protect some else from having the same frustrations :) )

faustomilletari commented 7 years ago

Well,

the point is that the current prototxt is the one that generated the results (which are the ones of the paper). Maybe the paper has a wrong figure, but the implementation is the one in this prototxt. Please note that the picture is not automatically generated but it is generated by me by hand, which is something tedious to do.

By porting to tensorflow i’m sure that the amount of people using this code will increase and therefore other possible bugs could be fixed. Moreover, anybody could actually do the porting.

Ultimately, this repository was opened to public access in order to allow people to build upon that, work on the code and make pull requests. I would love to see somebody taking the initiative to solve bugs or fix problem of this code base as it is right now.

Fausto

On 08 Mar 2017, at 15:17, Herbert notifications@github.com wrote:

I would think porting to tensorflow is equally complicated as the other ports (keras (which can be tensorflow) and pytorch) if it is unclear what the paper's network was. I don't think it's another option, I think it's an extra nice to have. But that is also because I am not familiar with caffe and prototext

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/faustomilletari/VNet/issues/9#issuecomment-285156319, or mute the thread https://github.com/notifications/unsubscribe-auth/AMtsvve1NSLhmhyrT1fj-9kk7TZQhd5gks5rjwz0gaJpZM4LkbY1.

prhbrt commented 7 years ago

But there are no bugs in the code, since that is what the results are based on, the bugs are in the paper, right?

The code is the 'ground truth'

prhbrt commented 7 years ago

Well, I'm just commenting on what would make sense to me :P But it's a free github, anyone can do as he pleases.

faustomilletari commented 7 years ago

Yeah, the paper should be updated on arxiv in any case. (I need also to add a citation to another important work).

I will try to do that, but I have to do it compatibly with my very busy work schedule.

Hopefully you all will see an updated paper soon.

A good news is that I have applied this work to ultrasound segmentation and it works quite ok, but it is still inferior compared to my works based on voting strategies (for example https://github.com/faustomilletari/Hough-CNN) on problems where a strong shape prior is needed.

In MRI prostate we have a quite deformable organ with shape variations which I can clearly see in the image, but in ultrasound we have a lot of artefacts and shadows which impair recognition and create mistakes in the boundaries. I have been using implicit shape models for that problem with success using random forests and CNNs. It’s an older work than V-Net but works well in US. It does not scale as well as V-Net though. Ironically these Hough voting methods could not deliver nice performances on prostate imagery...

Fausto

On 08 Mar 2017, at 15:25, Herbert notifications@github.com wrote:

But there are no bugs in the code, since that is what the results are based in, the bugs are in the paper, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/faustomilletari/VNet/issues/9#issuecomment-285158189, or mute the thread https://github.com/notifications/unsubscribe-auth/AMtsvtqUIBD01J1TcJfsalk5Ig_l1A5Dks5rjw6jgaJpZM4LkbY1.

mattmacy commented 7 years ago

@faustomilletari I couldn't tell from the paper since your respective datasets were quite different, but how do your results compare with 3D Unet (which appeared to come out almost concomitantly)?

I really appreciate you putting the prototxt up as it makes it easier to understand what you were doing than simply looking at the diagram in the paper.

I'm also not being critical. I'm just trying to solve a segmentation problem whilst leveraging prior art. Any implementation issues are just opportunities for better understanding the space.

If someone else wants to reimplement in TF that would be yet another data point. However, even though I have the most experience with TF I find that its API is essentially verbal vomit. At some point Keras will be reimplemented as the "official" interface to TF which will probably be palatable. Until then I don't see any reason to use it for development purposes unless required to by a course (it may well be the best solution for deploying to a mobile OS). Not only is pytorch noted for training as much as several times faster than other frameworks but I find the API extremely clean as compared with all the alternatives I've looked at. See my work in progress to get a sense for it: https://github.com/mattmacy/vnet.pytorch/blob/master/vnet.py - When working on something as esoteric as biomedical imaging I think the performance and ease of use of the framework is more important than the mainstream-ness. That said I instantly got 8 stars checking in something called vnet.pytorch. Maybe it would have been 80 if it were vnet.tensorflow.

I've only just written a loader for the preprocessed LUNA16 dataset and need to port over the loss function before I can start training today and don't expect to get good results until I implement the data augmentation functions.

faustomilletari commented 7 years ago

As of today I could not really run 3D U-Net. The code they provide is limited to the caffe extension, and they don’t give out any training loop implementation or stuff like that. I have indeed tried it, by using everything that V-Net uses and 3D-unet as network, and first thing, I needed to implement a overlapping-tile approach to obtain the final segmentations. then I needed to implement a training that was also done keeping in mind that images are shrinking in that framework.

In the end I think there were problems with training with sub-images that were completely background or foreground, and the results were very poor.

Not knowing if I was the reason why only poor results could be achieved,I moved on and archived that attempt among my failures.

I have heard of other people having issues with 3D U-Net.

TF interface is ugly but one could use slim, which is ok. It’s just a matter of habit though. Pytorch seems very cool, good that you are investing time in that!

Regards,

Fausto

On 08 Mar 2017, at 16:09, Matthew Macy notifications@github.com wrote:

@faustomilletari https://github.com/faustomilletari I couldn't tell from the paper since your respective datasets were quite different, but how do your results compare with 3D Unet (which appeared to come out almost concomitantly)?

I really appreciate you putting the prototxt up as it makes it easier to understand what you were doing than simply looking at the diagram in the paper.

I'm also not being critical. I'm just trying to solve a segmentation problem whilst leveraging prior art. Any implementation issues are just opportunities for better understand the space.

If someone else wants to reimplement in TF that would be yet another data point. However, even though I have the most experience with TF I find that its API is essentially verbal vomit. At some point Keras will be reimplemented as the "official" interface to TF at which will probably be palatable. Until then I don't see any reason to use it for development purposes unless required to by a course (it may well be the best solution for deploying to a mobile OS). Not only is pytorch noted for training as much as several times faster than other frameworks but I find the API extremely clean as compared with all the alternatives I've looked at. See my work in progress to get a sense for it: https://github.com/mattmacy/vnet.pytorch/blob/master/vnet.py https://github.com/mattmacy/vnet.pytorch/blob/master/vnet.py - When working on something as esoteric as biomedical imaging I think the performance and ease of use of the framework is more important than the mainstream-ness. That said I instantly got 8 stars checking in something called vnet.pytorch. Maybe it would have been 80 if it were vnet.tensorflow.

I've only just written a loader for the preprocessed LUNA16 dataset and need to port over the loss function before I can start training today and don't expect to get good results until I implement the data augmentation functions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/faustomilletari/VNet/issues/9#issuecomment-285169961, or mute the thread https://github.com/notifications/unsubscribe-auth/AMtsvqp4tWYk9zomJAI-KiwQ0DUypFjyks5rjxkQgaJpZM4LkbY1.

mattmacy commented 7 years ago

Thanks for the response. It's really hard for me to evaluate how much a given architecture contributes to the SotA when it's evaluated on a different data set from comparable architectures and there's no way to repeat the authors' experiments because they haven't made the data available in the right format - assuming it's publicly available at all. It might be worthwhile doing a survey paper with different models and different public data sets with all of the models hosted on github for direct peer review. I'm not an academic, but it might be an edifying exercise.

Ciao', Matt

faustomilletari commented 7 years ago

I fully agree. As a personal goal, I would like to put on github whatever I publish or I do outside from my job.. unfortunately i cannot always do that.

I have only recently discovered that github is one the most powerful tools ever invented, I’m determined to make a good use of it!

Bye,

Fausto

On 08 Mar 2017, at 16:31, Matthew Macy notifications@github.com wrote:

Thanks for the response. It's really hard for me to evaluate how much a given architecture contributes to the SotA when it's evaluated on a different data set from comparable architectures and there's no way to repeat the authors' experiments because they haven't made the data available in the right format - assuming it's publicly available at all. It might be worthwhile doing a survey paper with different models and different public data sets with all of the models hosted on github for direct peer review. I'm not an academic, but it might be an edifying exercise.

Ciao', Matt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/faustomilletari/VNet/issues/9#issuecomment-285175782, or mute the thread https://github.com/notifications/unsubscribe-auth/AMtsvvZV0I7lJJezimlKTZlFW3Kgi2TRks5rjx4ZgaJpZM4LkbY1.

mattmacy commented 7 years ago

@faustomilletari Just to update you on where I'm at before I get your thoughts on how to cope with the massive data set sizes. I ported the loss function: https://github.com/mattmacy/torchbiomed/blob/master/torchbiomed/loss.py I'm assuming that I should treat channel 0 as the target segmentation channel and channel 1 as the background. In order to do this I need to generate a second inverted mask in order to match against background channel. Have I understood that correctly?

This is what the network graph ends up looking like for me: https://github.com/mattmacy/vnet.pytorch/blob/master/images/vnet.png

My inputs are lung CTs with the voxels isotropically scaled to 2mm. In order to fit the nodule masks from all data points I ended up having to make the input volumes 192x160x192 (384mm x 320mm x 384mm). I've been trying to do a minibatch of size 2, but in order for forward propagation to complete I had to replace the PReLU with in-place ReLU. Even then backprop would still run out of memory until I reduced the mini-batch size to 1. Realistically I'll need to downsample still further to 2.5mm^3 voxels in order to pass the full volume to the current architecture.

Which brings me to what I wanted your opinion on. The nodules are all 30 mm or less in diameter, so if I did a sliding window scan on the z-axis by passing in 64mm x 320mm x 384mm at a time I would be guaranteed that one volume would fully contain the nodule. I could take this a step further and just pass in a 64mm^3 volume at a time. That would bring me down to 32768 voxels which would allow me to run more normal sized mini-batches on my (already obsolete :-/) Pascal Titan X. What are your thoughts on that approach? It seems like one would still get the benefits of volumetric segmentation with more tractable data quantities. However, one major concern I have with that approach is that it would induce an even bigger class imbalance than we're starting out with. At least when doing whole lung scans 60% of the patients in the data set have at least one nodule. With smaller volumes 90% of the scans will be negative. Would it be worthwhile (or perhaps even necessary) to re-balance the dataset in the loading process so that some reasonable (say 25%) of the volumes contained nodules?

Un saluto Matthew

mattmacy commented 7 years ago

I now understand what the purpose of the argmax is - to convert the softmax results to a one hot encoding. I now appear to be getting reasonable results on the simple task of lung segmentation but the network will often start off mostly favoring one channel or the other without a careful choice of seed.

faustomilletari commented 7 years ago

I will try to answer you in full tomorrow.

Thanks for your mail :) I really appreciate this!

Fausto Milletarì Sent from my iPhone

On 13 Mar 2017, at 20:39, Matthew Macy notifications@github.com wrote:

I now understand what the purpose of the argmax is - to convert the softmax results to a one hot encoding. I now appear to be getting reasonable results on the simple task of lung segmentation but the network will often start off mostly favoring one channel or the other without a careful choice of seed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kirk86 commented 7 years ago

However, even though I have the most experience with TF I find that its API is essentially verbal vomit

100% agree on that. It should have been theano 2.0 but it failed completely in that regard. Pun intented! PyTorch on the other hand even though it has nice syntax and lots of things inherited from chainer is still very immature and still lots of bugs.

@mattmacy If you don't mind I just have a question. Is your implementation correct?

@faustomilletari from going through all the above conversations it's still unclear to me whether there's an issue/error in the paper or in the code (Since people seem unable to replicate the results, if I understood correctly)?

Also it would have been nice to have .caffemodel file so that people can at least replicate and reproduce the results. Even in the case where the actual model in the paper might have been trained on a non public dataset for the sake of the experimentation, it would have been nice to also train that model on a public dataset, report the results and provide the .caffemodel, so that others can replicate the experiment and verify the validity of the model.

AmitAilianiSDC commented 6 years ago

@mattmacy Can you please provide an update on nodule detection using vnet and what worked for you?

faustomilletari commented 6 years ago

Also, if you are doing stuff consider http://tomaat.cloud to make it accessible!

Fausto Milletarì Sent from my iPhone

On 7. Jun 2018, at 15:26, amitailiani notifications@github.com wrote:

@mattmacy Can you please provide an update on nodule segmentation using vnet and what worked for you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.