bdzyubak / tensorflow-sandbox

A repository for studying applications of Deep Learning across fields, and demonstrating samples of my code and project managment
0 stars 0 forks source link

Implement EfficientNet for cell nuclei segmentation #5

Open bdzyubak opened 2 years ago

bdzyubak commented 2 years ago

People who work in cell analysis have reported to me in a private conversation that they achieve very high using EfficientNet by Google as the network architecture. Additionally, the network uses very few parameters compared to competitors. This should make it both faster to train and less prone to getting stuck in local minima, which has been an issue for my previous project with UNET.

This request is to implement EfficientNet, and compare it to other networks in the cell nuclei segmentation sub-project. Related to https://github.com/bdzyubak/Bogdans-Deep-Learning-Sandbox/issues/4

bdzyubak commented 2 years ago

The paper on EfficientNet is available here, and there are many explanations online: https://arxiv.org/pdf/1905.11946v5.pdf

Briefly, the network was found by Neural Architecture Search (NAS) i.e. is machine-constructed. It is scaled up to smaller or larger sizes in a controlled way which relates input image resolution, network depth (number of layers), and network width (number of channels) via a predefined formula and user-specifiable scaling coefficient. This enables efficient information propagation through the layers with minimal bottlenecks. I believe the same ratio may not be appropriate for all types of tasks - images with more complex features may still require deeper networks while differentiating more object classes may need a different width.

The network performs performance in ImageNet is comparable to top networks. Prior studies have shown very good generalizability (usually with transfer learning) to other image types. image

The number of parameters is also much smaller, as promised. image

bdzyubak commented 2 years ago

The network comes with Tensorflow 2.3 (Tensorflow is at 2.8 as of this post) and is importable as simply: from tensorflow.keras.applications import * # Choose desired B

Initialized with below. Can use pretrained weights and drop predictions layer for use with transfer learning.

conv_base = EfficientNetB6(weights="imagenet", include_top=False, input_shape=input_shape)

Add new predictions layer instead

A simple tutorial is available here: https://towardsdatascience.com/an-in-depth-efficientnet-tutorial-using-tensorflow-how-to-use-efficientnet-on-a-custom-dataset-1cab0997f65c

I've uploaded a stab here: a05d452878fb6cf31f506bf6b80b7a05e0996a1e.

However, the network is intended for Classification, not segmentation. The last layer before Dense is a 9x9x1400 for a 260x260x3 image. If converted to a segmentation network with a (9,9,1400) to (260,260,1) Dense layer, it would have very poor spacial accuracy. Alternatively, the network can be turned into a DeconvNET or a UNET-style network by upscaling and concatinating with early layers. However, this will increase the network size back up, negating the key advantage. Closing this request as invalid for now. Will look at other state of the art segmentation networks, and investigate a good use for EfficientNet at a later date.

bdzyubak commented 2 years ago

In fact, a classification network could be implemented for segmentation purposes using a sliding window methodology. For each window, classify as object or no and assign to the center pixel in the output mask a 0 or 1 value. This is how R-CNN/Mask R-CNN work. https://nanonets.com/blog/how-to-do-semantic-segmentation-using-deep-learning/

For a stride of 1, the prediction may be failry slow. The smallest Efficient net would apply 5.3M calculations * 100,000 (for a 1kx1k image) = 500M. By contrast, a UNET from https://github.com/bdzyubak/Bogdans-Deep-Learning-Sandbox/issues/4 can one shot with 31 M parameters. Using a larger-stride prediction window would speed up this process at the cost of resolution. Prediction cost is a relatively small problem for cloud processing, however, this this may not be a big issue.

Data for training would be generated similarly but cropping matched sliding windows from input data and masks. In a task of segmenting many small objects, like cells, sniping each image into many examples can actually significantly expand the training dataset while still offering a variety of examples. Predicting on the whole image wastes weights on globabl spatial relationships which may not be meaningful. A stride >1 may still be warranted to balance indpendent information vs training time.