3D convolution and pooling

davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++

http://dlib.net

Boost Software License 1.0

13.42k stars 3.37k forks source link

3D convolution and pooling #989

Open jackyko1991 opened 6 years ago

jackyko1991 commented 6 years ago

I am currently looking for CNN training tools for windows with c++. dlib seems to be a good choice as the example clearly illustrate how ResNet is implemented.

However, I am looking for applications for 3D image segmentation which requires 3D pooling and convolution (somehow it is called 4D convolution in this link

This function is supported in caffe2, tensorflow and pytorch. Any plan to implement for a higher dimension CNN operations?

davisking commented 6 years ago

It's on my todo list. It's a low priority for me though.

sarthakpati commented 5 years ago

Hey @davisking, any update on this particular issue? Thanks.

davisking commented 5 years ago

It's still on my todo list. However, it's not very hard to do, so anyone else could submit a PR ;)

thinkgamesdeveloper commented 3 years ago

Hey @davisking, any update on this particular issue? MayBe i can do a PR, could you please give me a roadmap to achieving it? Thanks.

davisking commented 3 years ago

I'm not going to work on this any time soon.

kSkip commented 1 week ago

@davisking I'm working on adding this to my project. It appears to be simple enough to modify (or create another) tt::tensor_conv and add a con3d_ layer for 3D filters. Unfortunately, 4D filters (which are required for RGB sequences) require 5D tensors as both input and the filter stack.

Are you open to expanding the dimensions of the tensor class to 5, given it maintains a consistent interface and behavior for the existing usages?

davisking commented 1 week ago

Changing the tensor class to have 5 dimensions is a huge change to everything. You should be able to do this without doing that though by just flattening everything along the k dimension and making the 3D conv code smart about doing the right thing. Then no sweeping architectural changes are needed to the rest of the code in dlib.