Ag-Net: building a customized deep neural network for recognizing crop categories based on spectral characteristics

ZihengSun commented 5 years ago

ESIP Member Organization

CSISS/LAITS, George Mason University Alaska Ocean Observing System (AOOS) and Axiom Data Science

Mentors

Ziheng Sun Jesse Lopez

Project Ideas

Ag-Net: building a customized deep neural network for recognizing crop categories based on spectral characteristics

Information for students

See ESIP general guidelines

Abstract

How many kinds of crops can you recognize? It is hard to say many. In most time of growing season, they are all green plants. Dent corn and sweet corn, black bean and red bean, barley and wheat, grass and weeds, etc. Distinguishing them takes a ton of knowledge and experiences. Agriculture scientists have struggled for years to figure out an automated way to recognize them. Deep learning is a powerful tool for non-linear classification problems. The critical part for deep learning is training dataset, which can be extracted from the reports and map products of U.S. department of agriculture. However, the existing deep neural networks are not performing as well as expected on crop classification because of their learned representation features in the back propagation are not common enough to tell the small differences among crops with similar external look. A customized network with special filters may help tell those minor differences in high spectral characteristics for more accurate recognition results.

Technical Details

Python; Keras; Geoweaver; numpy; scikit-learn; matplotlib; GDAL.

Helpful Experience

Machine learning knowledge; satellite image manipulation; python programming.

First steps

Start to get familiar with DeeplabV3, U-net or any other state-of-art deep neural network and test them on a sample training dataset.

sinAshish commented 5 years ago

@ZihengSun I trained a UNet model with resnet as encoder for 10 epochs, and get a dice score of around 0.48. I used all the 7 channels at once. But I need some help in loading the mask to appropriate no. of classes

Sorry , I am not very clear about the question. Do you want to shrink the number of classes by removing those empty classes?

I stacked all the 7 channels to create a input tensor of (128,128,7), but by mistake I loaded the masks as only ('RGB') for training so the output tensor was (128,128,3). I need help in creating a mask that has the shape of (128,128,255), like how t o do it?

sankalpmittal1911-BitSian commented 5 years ago

u9 = Conv2DTranspose(n_filters*1, (3, 3), strides=(2, 2), padding='same') (c8)
    u9 = concatenate([u9, c1], axis=3)
    u9 = Dropout(dropout)(u9)
    c9 = conv2d_block(u9, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)
    #c9 = conv2d_block(u9, n_filters=255, kernel_size=3, batchnorm=batchnorm)

    outputs = Conv2D(255, (1, 1)) (c9)
    #outputs = core.Reshape((128,128 ,  255))(outputs)
    #conv6 = core.Permute((2,1))(outputs)

    outputs = core.Activation('softmax')(outputs)
    #outputs = softmax(outputs, axis = 3)(outputs)

This is my code. The final output is of form (128,128,255). How to apply softmax to this so that I can select image belonging to class having maximum probability?

@ZihengSun Sir can you help me regarding this? Applying softmax along an axis in Keras API. Thank you. I am assuming that input image is grayscale.

ZihengSun commented 5 years ago

@sankalpmittal1911-BitSian if you are using Keras, I assume the output of softmax layer will be allowed to calculate the loss with CDL image. Please refer to this code for building U-net in Keras.

1998at commented 5 years ago

@ZihengSun Why are codes from 1-254 when there are just 133 categories in total in cdlvalues.csv

ZihengSun commented 5 years ago

@at1998 the missing numbers have no corresponding crop classes so they are skipped. There are only ~130 kinds of crops in US ag statistics.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun Sir, the person has reshaped the output from (batch,img_width,img_height,n_classes) to (batch,img_width*img_height,n_classes) and then applied softmax on it. Should I do the same? And then I will have to change the outputs in the same format as well.

activation_155 (Activation)     (None, 128, 128, 255 0           batch_normalization_153[0][0]    
__________________________________________________________________________________________________
conv2d_155 (Conv2D)             (None, 128, 128, 255 585480      activation_155[0][0]             
__________________________________________________________________________________________________
batch_normalization_154 (BatchN (None, 128, 128, 255 1020        conv2d_155[0][0]                 
__________________________________________________________________________________________________
activation_156 (Activation)     (None, 128, 128, 255 0           batch_normalization_154[0][0]    
__________________________________________________________________________________________________
reshape_7 (Reshape)             (None, 255, 16384)   0           activation_156[0][0]             
__________________________________________________________________________________________________
permute_4 (Permute)             (None, 16384, 255)   0           reshape_7[0][0]                  
__________________________________________________________________________________________________
activation_157 (Activation)     (None, 16384, 255)   0           permute_4[0][0]                  
==================================================================================================
Total params: 2,818,431
Trainable params: 2,814,531
Non-trainable params: 3,900

u9 = Conv2DTranspose(n_filters*1, (3, 3), strides=(2, 2), padding='same') (c8)
    u9 = concatenate([u9, c1], axis=3)
    u9 = Dropout(dropout)(u9)
    c9 = conv2d_block(u9, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)
    c9 = conv2d_block(u9, n_filters=255, kernel_size=3, batchnorm=batchnorm)

    conv6 = core.Reshape((255,128*128))(c9)
    conv6 = core.Permute((2,1))(conv6)

    outputs = core.Activation('softmax')(conv6)

    model = Model(inputs=[input_img], outputs=[outputs])
    return model

I don't know how the person is calculating softmax. But I am assuming its normalizing over number of classes only. Now I wish to select (16384,1) image from (16384,255) possible images having maximum probability. How can I do that?

Alternatively I can create output labels of the form (128,128,255) or (16384,255). Then apply pixel wise cross entropy. But according to the dataset I can't figure out how to associate one hot encoded vector with respect to every pixel of output image.

Edit: It's 130 instead of 255, I will change that as well. Edit2: Just realized softmax defaults to last axis so it's fine. Now only problem is with choosing the image pixels with maximum probability.

Thank you.

1998at commented 5 years ago

@ZihengSun I replaced the values in the mask from 1-254 o 1-132 and it loss seems to steadily go down.Dice Scores are Around 0.48 while accuracy is in range 0.5-0.6.NOte that I converted the image to 3 channel by just adding all the values together.Will now increase the initial layer channels to 11 and observe how our accuracy goes up

1998at commented 5 years ago

@ZihengSun Followed your Advice for observing the effect of training it for more epochs and here's The result.I havent implemented validation set yet.This is just training set.Will implement validation and update you in case model is overfitting.

Also Could you please help me with he proposal.Do I have to write about the techniques that I am using on this sample Set.I am really confused.

sankalpmittal1911-BitSian commented 5 years ago

Epoch 1/100
50/50 [==============================] - 1705s 34s/step - loss: 5.1689 - acc: 0.1686 - val_loss: 4.9105 - val_acc: 0.2356

Epoch 00001: val_loss improved from inf to 4.91052, saving model to model.h5
Epoch 2/100
50/50 [==============================] - 1021s 20s/step - loss: 4.4849 - acc: 0.2629 - val_loss: 4.4510 - val_acc: 0.1839

Epoch 00002: val_loss improved from 4.91052 to 4.45103, saving model to model.h5
Epoch 3/100
50/50 [==============================] - 1117s 22s/step - loss: 3.7226 - acc: 0.2872 - val_loss: 4.1678 - val_acc: 0.2061

Epoch 00003: val_loss improved from 4.45103 to 4.16783, saving model to model.h5
Epoch 4/100
50/50 [==============================] - 1109s 22s/step - loss: 3.0371 - acc: 0.2992 - val_loss: 3.4495 - val_acc: 0.2427

Epoch 00004: val_loss improved from 4.16783 to 3.44954, saving model to model.h5
Epoch 5/100
50/50 [==============================] - 1116s 22s/step - loss: 2.6087 - acc: 0.3151 - val_loss: 2.9204 - val_acc: 0.2475

Epoch 00005: val_loss improved from 3.44954 to 2.92044, saving model to model.h5
Epoch 6/100
50/50 [==============================] - 1119s 22s/step - loss: 2.4059 - acc: 0.3271 - val_loss: 2.5952 - val_acc: 0.2727

Epoch 00006: val_loss improved from 2.92044 to 2.59517, saving model to model.h5
Epoch 7/100
50/50 [==============================] - 1108s 22s/step - loss: 2.3171 - acc: 0.3289 - val_loss: 2.3975 - val_acc: 0.3254

Epoch 00007: val_loss improved from 2.59517 to 2.39755, saving model to model.h5
Epoch 8/100
50/50 [==============================] - 1117s 22s/step - loss: 2.2777 - acc: 0.3291 - val_loss: 2.4233 - val_acc: 0.3114

Epoch 00008: val_loss did not improve from 2.39755
Epoch 9/100
50/50 [==============================] - 1120s 22s/step - loss: 2.2546 - acc: 0.3312 - val_loss: 2.3257 - val_acc: 0.3173

Epoch 00009: val_loss improved from 2.39755 to 2.32571, saving model to model.h5
Epoch 10/100
50/50 [==============================] - 1130s 23s/step - loss: 2.2417 - acc: 0.3330 - val_loss: 2.3650 - val_acc: 0.3170

Epoch 00010: val_loss did not improve from 2.32571
Epoch 11/100
10/50 [=====>........................] - ETA: 4:10 - loss: 2.2265 - acc: 0.3341

For 10 epochs. Batch Size is 32. Pixel-wise cross entropy loss function. Input: (Batch,128,128,7) Output: (Batch,128,128,num_classes)

Should I continue training for 100 epochs or should I change something because learning is too slow?

Thanks.

ZihengSun commented 5 years ago

Epoch 1/100
50/50 [==============================] - 1705s 34s/step - loss: 5.1689 - acc: 0.1686 - val_loss: 4.9105 - val_acc: 0.2356

Epoch 00001: val_loss improved from inf to 4.91052, saving model to model.h5
Epoch 2/100
50/50 [==============================] - 1021s 20s/step - loss: 4.4849 - acc: 0.2629 - val_loss: 4.4510 - val_acc: 0.1839

Epoch 00002: val_loss improved from 4.91052 to 4.45103, saving model to model.h5
Epoch 3/100
50/50 [==============================] - 1117s 22s/step - loss: 3.7226 - acc: 0.2872 - val_loss: 4.1678 - val_acc: 0.2061

Epoch 00003: val_loss improved from 4.45103 to 4.16783, saving model to model.h5
Epoch 4/100
50/50 [==============================] - 1109s 22s/step - loss: 3.0371 - acc: 0.2992 - val_loss: 3.4495 - val_acc: 0.2427

Epoch 00004: val_loss improved from 4.16783 to 3.44954, saving model to model.h5
Epoch 5/100
50/50 [==============================] - 1116s 22s/step - loss: 2.6087 - acc: 0.3151 - val_loss: 2.9204 - val_acc: 0.2475

Epoch 00005: val_loss improved from 3.44954 to 2.92044, saving model to model.h5
Epoch 6/100
50/50 [==============================] - 1119s 22s/step - loss: 2.4059 - acc: 0.3271 - val_loss: 2.5952 - val_acc: 0.2727

Epoch 00006: val_loss improved from 2.92044 to 2.59517, saving model to model.h5
Epoch 7/100
50/50 [==============================] - 1108s 22s/step - loss: 2.3171 - acc: 0.3289 - val_loss: 2.3975 - val_acc: 0.3254

Epoch 00007: val_loss improved from 2.59517 to 2.39755, saving model to model.h5
Epoch 8/100
50/50 [==============================] - 1117s 22s/step - loss: 2.2777 - acc: 0.3291 - val_loss: 2.4233 - val_acc: 0.3114

Epoch 00008: val_loss did not improve from 2.39755
Epoch 9/100
50/50 [==============================] - 1120s 22s/step - loss: 2.2546 - acc: 0.3312 - val_loss: 2.3257 - val_acc: 0.3173

Epoch 00009: val_loss improved from 2.39755 to 2.32571, saving model to model.h5
Epoch 10/100
50/50 [==============================] - 1130s 23s/step - loss: 2.2417 - acc: 0.3330 - val_loss: 2.3650 - val_acc: 0.3170

Epoch 00010: val_loss did not improve from 2.32571
Epoch 11/100
10/50 [=====>........................] - ETA: 4:10 - loss: 2.2265 - acc: 0.3341

For 10 epochs. Batch Size is 32. Pixel-wise cross entropy loss function. Input: (Batch,128,128,7) Output: (Batch,128,128,num_classes)

Should I continue training for 100 epochs or should I change something because learning is too slow?

Thanks.

@sankalpmittal1911-BitSian It is not slow to me. Usually it takes more epochs to train because of the similarity between crops.

ZihengSun commented 5 years ago

@at1998 I guess you do need write all the details according to the proposal writing guidelines. The deliverables in community bonding and coding phases should be explicitly described.

ZihengSun commented 5 years ago

@all A color map of the crop values is uploaded. Use it to render your results into the standard crop map style.

1998at commented 5 years ago

@ZihengSun In the color map all values (till 255) are available whereas the model that I have been training I have converted values from 1 to 133 and not till 255 since they are not available in the Masks. Should The final Layer of the model be 133 or 255 ?

ZihengSun commented 5 years ago

Should be 255. Your 133 should be reverse mapped to 255 during map production phase.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun Because as per csv table this 133rd class is actually the pixel value 254?

1998at commented 5 years ago

@ZihengSun Just To clarify ,I should have 133 output in the last layer of UNet as in shape will be (133,128,128) and then reverse map the 133 values to original Values like 255=133 in this model?Am i Assuming correct

ZihengSun commented 5 years ago

It actually has 256 values (8 bit). The best practice is directly using (128, 128, 255) as output shape, using (128, 128, 7) as input shape.

133 is only needed when your network requires consecutive output values. I suggest creating a map variable to remember the relationship between real crop category values and your projected values specially designed for your network, so you won't get confused about what the predicted values mean.

1998at commented 5 years ago

@ZihengSun I am saving the mapping to a dictionary and will use that to reverse it.Thanks for clarification.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun Images are (128,128) right? Are you suggesting that I resize the images to (360,360)?

Saw the edit. Thanks. Pixel values are between 0-255. So I guess it should be (128,128,256) however former is fine too I think because maximum value is 254 anyway.

sinAshish commented 5 years ago

@ZihengSun I have made my network as such. Input tensor of (128,128,7) and output tensor of (128,128,255)

1998at commented 5 years ago

@sankalpmittal1911-BitSian I tink he is talking about the original Dataset that we will be working upon in the program.Right now we are only given a sample dataset to try out.

ZihengSun commented 5 years ago

The size is 128, not 360. I downsized the tiles to make the required memory smaller for each batch.

1998at commented 5 years ago

@ZihengSun Could we have access to the original Dataset of size 360 . In almost all cases,Increasing The size will lead to better results when it comes to Segmentation.

ZihengSun commented 5 years ago

They should have the same results. The impacted region is the edge areas which should be fine in this case. Some time ago I used FCN (I forget which code I used) on the 128 dataset and have this result: The accuracy is >80%. You approach at least achieve this accuracy. Carefully inspect your code and network to figure out why your model is not working.

1998at commented 5 years ago

@ZihengSun I am having trouble in setting up rotation for both image masks and input.Right now I am using the 4 dihedral transform(90 degree rotation with flips) as part of my Data Augmentation.Also Did you Normalise the values to ImageNet Stats(using a pretrained model as feature extractor).Right now I am Normalising the data to have 0 mean and 1 std dev but at the same time I am using a Resnet as a feature in Middle Layers.I will try different Normalisaion to see the effect.

1998at commented 5 years ago

@ZihengSun One more thing I needed to confirm,Do I need to be restricted to Keras.Right now I am working on models with Pytorch.

ZihengSun commented 5 years ago

It is not restricted to Keras. Pytorch should work.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun I will also try pytorch if keras does not do the job. Will get back soon.

1998at commented 5 years ago

@sankalpmittal1911-BitSian Let me know if you need any help and usually choice of a framework doesnt have that much affect on a model's accuracy since base operations are very much similar among frameworks

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun @at1998

I am having hard time improving the accuracy after certain epoch. Here is my training-validation log:

Epoch 1/100
675/675 [==============================] - 5518s 8s/step - loss: 2.1004 - acc: 0.3985 - val_loss: 6.5754 - val_acc: 0.2246

Epoch 00001: val_loss improved from inf to 6.57537, saving model to model.h5
Epoch 2/100
675/675 [==============================] - 1119s 2s/step - loss: 1.9145 - acc: 0.4472 - val_loss: 5.7538 - val_acc: 0.3389

Epoch 00002: val_loss improved from 6.57537 to 5.75385, saving model to model.h5
Epoch 3/100
675/675 [==============================] - 1119s 2s/step - loss: 1.8592 - acc: 0.4621 - val_loss: 2.8144 - val_acc: 0.2887

Epoch 00003: val_loss improved from 5.75385 to 2.81440, saving model to model.h5
Epoch 4/100
675/675 [==============================] - 1132s 2s/step - loss: 1.8234 - acc: 0.4741 - val_loss: 3.4980 - val_acc: 0.2567

Epoch 00004: val_loss did not improve from 2.81440
Epoch 5/100
675/675 [==============================] - 1121s 2s/step - loss: 1.7949 - acc: 0.4820 - val_loss: 4.3436 - val_acc: 0.2417

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0009999999776482583.

Epoch 00005: val_loss did not improve from 2.81440
Epoch 6/100
675/675 [==============================] - 1128s 2s/step - loss: 1.7152 - acc: 0.5067 - val_loss: 1.8597 - val_acc: 0.4699

Epoch 00006: val_loss improved from 2.81440 to 1.85970, saving model to model.h5
Epoch 7/100
675/675 [==============================] - 1127s 2s/step - loss: 1.6981 - acc: 0.5113 - val_loss: 1.9781 - val_acc: 0.4349

Epoch 00007: val_loss did not improve from 1.85970
Epoch 8/100
675/675 [==============================] - 1144s 2s/step - loss: 1.6885 - acc: 0.5142 - val_loss: 1.9789 - val_acc: 0.4414

Epoch 00008: ReduceLROnPlateau reducing learning rate to 9.999999310821295e-05.

Epoch 00008: val_loss did not improve from 1.85970
Epoch 9/100
675/675 [==============================] - 1143s 2s/step - loss: 1.6717 - acc: 0.5197 - val_loss: 1.7216 - val_acc: 0.5034

Epoch 00009: val_loss improved from 1.85970 to 1.72164, saving model to model.h5
Epoch 10/100
675/675 [==============================] - 1143s 2s/step - loss: 1.6682 - acc: 0.5208 - val_loss: 1.7112 - val_acc: 0.5082

Epoch 00010: val_loss improved from 1.72164 to 1.71118, saving model to model.h5
Epoch 11/100
675/675 [==============================] - 1134s 2s/step - loss: 1.6665 - acc: 0.5212 - val_loss: 1.7099 - val_acc: 0.5086

Epoch 00011: val_loss improved from 1.71118 to 1.70989, saving model to model.h5
Epoch 12/100
675/675 [==============================] - 1143s 2s/step - loss: 1.6650 - acc: 0.5218 - val_loss: 1.7065 - val_acc: 0.5100

Epoch 00012: val_loss improved from 1.70989 to 1.70651, saving model to model.h5
Epoch 13/100
675/675 [==============================] - 1136s 2s/step - loss: 1.6636 - acc: 0.5221 - val_loss: 1.7085 - val_acc: 0.5087

Epoch 00013: val_loss did not improve from 1.70651
Epoch 14/100
675/675 [==============================] - 1141s 2s/step - loss: 1.6619 - acc: 0.5227 - val_loss: 1.7054 - val_acc: 0.5100

Epoch 00014: val_loss improved from 1.70651 to 1.70538, saving model to model.h5
Epoch 15/100
675/675 [==============================] - 1120s 2s/step - loss: 1.6607 - acc: 0.5232 - val_loss: 1.7036 - val_acc: 0.5106

Epoch 00015: val_loss improved from 1.70538 to 1.70361, saving model to model.h5
Epoch 16/100
675/675 [==============================] - 1114s 2s/step - loss: 1.6593 - acc: 0.5235 - val_loss: 1.7008 - val_acc: 0.5118

Epoch 00016: val_loss improved from 1.70361 to 1.70075, saving model to model.h5
Epoch 17/100
675/675 [==============================] - 1125s 2s/step - loss: 1.6581 - acc: 0.5239 - val_loss: 1.7030 - val_acc: 0.5105

Epoch 00017: val_loss did not improve from 1.70075
Epoch 18/100
675/675 [==============================] - 1123s 2s/step - loss: 1.6569 - acc: 0.5243 - val_loss: 1.6997 - val_acc: 0.5120

Epoch 00018: val_loss improved from 1.70075 to 1.69970, saving model to model.h5
Epoch 19/100
675/675 [==============================] - 1119s 2s/step - loss: 1.6557 - acc: 0.5246 - val_loss: 1.6994 - val_acc: 0.5118

Epoch 00019: val_loss improved from 1.69970 to 1.69941, saving model to model.h5
Epoch 20/100
675/675 [==============================] - 1127s 2s/step - loss: 1.6544 - acc: 0.5251 - val_loss: 1.6981 - val_acc: 0.5122

Epoch 00020: val_loss improved from 1.69941 to 1.69810, saving model to model.h5
Epoch 21/100
675/675 [==============================] - 1125s 2s/step - loss: 1.6530 - acc: 0.5256 - val_loss: 1.6984 - val_acc: 0.5119

Epoch 00021: val_loss did not improve from 1.69810
Epoch 22/100
675/675 [==============================] - 1125s 2s/step - loss: 1.6518 - acc: 0.5259 - val_loss: 1.6961 - val_acc: 0.5130

Epoch 00022: val_loss improved from 1.69810 to 1.69612, saving model to model.h5
Epoch 23/100
675/675 [==============================] - 1128s 2s/step - loss: 1.6505 - acc: 0.5264 - val_loss: 1.6945 - val_acc: 0.5136

Epoch 00023: val_loss improved from 1.69612 to 1.69453, saving model to model.h5
Epoch 24/100
675/675 [==============================] - 1127s 2s/step - loss: 1.6496 - acc: 0.5266 - val_loss: 1.6949 - val_acc: 0.5133

Epoch 00024: val_loss did not improve from 1.69453
Epoch 25/100
675/675 [==============================] - 1130s 2s/step - loss: 1.6483 - acc: 0.5271 - val_loss: 1.6930 - val_acc: 0.5140

Epoch 00025: val_loss improved from 1.69453 to 1.69305, saving model to model.h5
Epoch 26/100
675/675 [==============================] - 1133s 2s/step - loss: 1.6471 - acc: 0.5275 - val_loss: 1.6942 - val_acc: 0.5132

Epoch 00026: val_loss did not improve from 1.69305
Epoch 27/100
675/675 [==============================] - 1126s 2s/step - loss: 1.6459 - acc: 0.5279 - val_loss: 1.6903 - val_acc: 0.5151

Epoch 00027: val_loss improved from 1.69305 to 1.69033, saving model to model.h5
Epoch 28/100
675/675 [==============================] - 1048s 2s/step - loss: 1.6447 - acc: 0.5283 - val_loss: 1.6903 - val_acc: 0.5150

Epoch 00028: val_loss improved from 1.69033 to 1.69027, saving model to model.h5
Epoch 29/100
675/675 [==============================] - 1097s 2s/step - loss: 1.6436 - acc: 0.5286 - val_loss: 1.6897 - val_acc: 0.5151

Epoch 00029: val_loss improved from 1.69027 to 1.68966, saving model to model.h5
Epoch 30/100
675/675 [==============================] - 1057s 2s/step - loss: 1.6424 - acc: 0.5290 - val_loss: 1.6876 - val_acc: 0.5160

Epoch 00030: val_loss improved from 1.68966 to 1.68756, saving model to model.h5
Epoch 31/100
675/675 [==============================] - 1049s 2s/step - loss: 1.6412 - acc: 0.5294 - val_loss: 1.6841 - val_acc: 0.5177

Epoch 00031: val_loss improved from 1.68756 to 1.68409, saving model to model.h5
Epoch 32/100
675/675 [==============================] - 1015s 2s/step - loss: 1.6400 - acc: 0.5298 - val_loss: 1.6859 - val_acc: 0.5167

Epoch 00032: val_loss did not improve from 1.68409
Epoch 33/100
254/675 [==========>...................] - ETA: 8:44 - loss: 1.6348 - acc: 0.5302

First I tried with this model:

def get_unet(input_img, n_filters=16, dropout=0.15, batchnorm=True):
    # contracting path
    c1 = conv2d_block(input_img, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)
    p1 = MaxPooling2D((2, 2)) (c1)
    p1 = Dropout(rate=dropout*0.5)(p1)

    c2 = conv2d_block(p1, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)
    p2 = MaxPooling2D((2, 2)) (c2)
    p2 = Dropout(rate=dropout)(p2)

    c3 = conv2d_block(p2, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)
    p3 = MaxPooling2D((2, 2)) (c3)
    p3 = Dropout(rate=dropout)(p3)

    c4 = conv2d_block(p3, n_filters=n_filters*8, kernel_size=3, batchnorm=batchnorm)
    p4 = MaxPooling2D(pool_size=(2, 2)) (c4)
    p4 = Dropout(rate=dropout)(p4)

    c5 = conv2d_block(p4, n_filters=n_filters*16, kernel_size=3, batchnorm=batchnorm)

    # expansive path
    u6 = Conv2DTranspose(n_filters*8, (3, 3), strides=(2, 2), padding='same') (c5)
    u6 = concatenate([u6, c4])
    u6 = Dropout(rate=dropout)(u6)
    c6 = conv2d_block(u6, n_filters=n_filters*8, kernel_size=3, batchnorm=batchnorm)

    u7 = Conv2DTranspose(n_filters*4, (3, 3), strides=(2, 2), padding='same') (c6)
    u7 = concatenate([u7, c3])
    u7 = Dropout(rate=dropout)(u7)
    c7 = conv2d_block(u7, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)

    u8 = Conv2DTranspose(n_filters*2, (3, 3), strides=(2, 2), padding='same') (c7)
    u8 = concatenate([u8, c2])
    u8 = Dropout(rate=dropout)(u8)
    c8 = conv2d_block(u8, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)

    u9 = Conv2DTranspose(n_filters*1, (3, 3), strides=(2, 2), padding='same') (c8)
    u9 = concatenate([u9, c1], axis=3)
    u9 = Dropout(rate=dropout)(u9)
    c9 = conv2d_block(u9, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)

    c10 = conv2d_block(c9, n_filters=255, kernel_size=1, batchnorm=batchnorm)

    outputs = Conv2D(255, (1, 1), activation='softmax') (c10)

    model = Model(inputs=[input_img], outputs=[outputs])
    return model

Then I changed to:

def get_unet(input_img, n_filters=256, dropout=0.15, batchnorm=True):
    # contracting path
    c1 = conv2d_block(input_img, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)
    p1 = MaxPooling2D((2, 2)) (c1)
    p1 = Dropout(rate=dropout*0.5)(p1)

    c2 = conv2d_block(p1, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)

    # expansive path
    u3 = Conv2DTranspose(n_filters*1, (3, 3), strides=(2, 2), padding='same') (c2)
    u3 = concatenate([u3, c1])
    u3 = Dropout(rate=dropout)(u3)
    c3 = conv2d_block(u3, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)

    c4 = conv2d_block(c3, n_filters=255, kernel_size=1, batchnorm=batchnorm)

    outputs = Conv2D(255, (1, 1), activation='softmax') (c4)

    model = Model(inputs=[input_img], outputs=[outputs])
    return model

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
img (InputLayer)                (None, 128, 128, 7)  0                                            
__________________________________________________________________________________________________
conv2d_64 (Conv2D)              (None, 128, 128, 256 16384       img[0][0]                        
__________________________________________________________________________________________________
batch_normalization_61 (BatchNo (None, 128, 128, 256 1024        conv2d_64[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_61 (LeakyReLU)      (None, 128, 128, 256 0           batch_normalization_61[0][0]     
__________________________________________________________________________________________________
conv2d_65 (Conv2D)              (None, 128, 128, 256 590080      leaky_re_lu_61[0][0]             
__________________________________________________________________________________________________
batch_normalization_62 (BatchNo (None, 128, 128, 256 1024        conv2d_65[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_62 (LeakyReLU)      (None, 128, 128, 256 0           batch_normalization_62[0][0]     
__________________________________________________________________________________________________
max_pooling2d_13 (MaxPooling2D) (None, 64, 64, 256)  0           leaky_re_lu_62[0][0]             
__________________________________________________________________________________________________
dropout_25 (Dropout)            (None, 64, 64, 256)  0           max_pooling2d_13[0][0]           
__________________________________________________________________________________________________
conv2d_66 (Conv2D)              (None, 64, 64, 512)  1180160     dropout_25[0][0]                 
__________________________________________________________________________________________________
batch_normalization_63 (BatchNo (None, 64, 64, 512)  2048        conv2d_66[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_63 (LeakyReLU)      (None, 64, 64, 512)  0           batch_normalization_63[0][0]     
__________________________________________________________________________________________________
conv2d_67 (Conv2D)              (None, 64, 64, 512)  2359808     leaky_re_lu_63[0][0]             
__________________________________________________________________________________________________
batch_normalization_64 (BatchNo (None, 64, 64, 512)  2048        conv2d_67[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_64 (LeakyReLU)      (None, 64, 64, 512)  0           batch_normalization_64[0][0]     
__________________________________________________________________________________________________
conv2d_transpose_13 (Conv2DTran (None, 128, 128, 256 1179904     leaky_re_lu_64[0][0]             
__________________________________________________________________________________________________
concatenate_13 (Concatenate)    (None, 128, 128, 512 0           conv2d_transpose_13[0][0]        
                                                                 leaky_re_lu_62[0][0]             
__________________________________________________________________________________________________
dropout_26 (Dropout)            (None, 128, 128, 512 0           concatenate_13[0][0]             
__________________________________________________________________________________________________
conv2d_68 (Conv2D)              (None, 128, 128, 256 1179904     dropout_26[0][0]                 
__________________________________________________________________________________________________
batch_normalization_65 (BatchNo (None, 128, 128, 256 1024        conv2d_68[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_65 (LeakyReLU)      (None, 128, 128, 256 0           batch_normalization_65[0][0]     
__________________________________________________________________________________________________
conv2d_69 (Conv2D)              (None, 128, 128, 256 590080      leaky_re_lu_65[0][0]             
__________________________________________________________________________________________________
batch_normalization_66 (BatchNo (None, 128, 128, 256 1024        conv2d_69[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_66 (LeakyReLU)      (None, 128, 128, 256 0           batch_normalization_66[0][0]     
__________________________________________________________________________________________________
conv2d_70 (Conv2D)              (None, 128, 128, 255 65535       leaky_re_lu_66[0][0]             
__________________________________________________________________________________________________
batch_normalization_67 (BatchNo (None, 128, 128, 255 1020        conv2d_70[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_67 (LeakyReLU)      (None, 128, 128, 255 0           batch_normalization_67[0][0]     
__________________________________________________________________________________________________
conv2d_71 (Conv2D)              (None, 128, 128, 255 65280       leaky_re_lu_67[0][0]             
__________________________________________________________________________________________________
batch_normalization_68 (BatchNo (None, 128, 128, 255 1020        conv2d_71[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_68 (LeakyReLU)      (None, 128, 128, 255 0           batch_normalization_68[0][0]     
__________________________________________________________________________________________________
conv2d_72 (Conv2D)              (None, 128, 128, 255 65280       leaky_re_lu_68[0][0]             
==================================================================================================
Total params: 7,302,647
Trainable params: 7,297,531
Non-trainable params: 5,116

Can anyone point out what I should do to eliminate this problem? Thanks.

1998at commented 5 years ago

@sankalpmittal1911-BitSian From the logs of your Training It seems that validation Loss Is going down after every epoch.So I dont See a Problem in that.Try training it for more time.

ZihengSun commented 5 years ago

@sankalpmittal1911-BitSian Thanks for the network. Can you use the current model to predict an image using a random input tile from the test dataset? Let's first check if the setup will work correctly before training it with more epoches.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun Okay. I will input any one of the (128,128,7) from the validation set and post the predicted (128,128,1) segmented image after the argmax of (128,128,255) output. I am right now using model trained for only 35 epochs. Thanks.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun Sir, So I used my model trained for only 30 epochs first. Got the accuracy of 53% (I said that it was not improving after certain epoch). Then I took (128,128,7) from the validation set and these 7 input images are:

Then I loaded my model and used it to predict the output:

ans=new_model.predict(X_test)
print(ans.shape)

ans_final=np.zeros((128,128,1))
index=-1
maxim=-1

for i in range(128):
  for j in range(128):
    for l in range(255):
      if(ans[0][i][j][l]>maxim):
        maxim=ans[0][i][j][l]
        index=l
    ans_final[i][j][0]=index
    index=-1
    maxim=-1

print(ans_final)

The predicted array:

[[[194.]
  [195.]
  [195.]
  ...
  [  5.]
  [  5.]
  [  4.]]

 [[195.]
  [195.]
  [195.]
  ...
  [  5.]
  [  5.]
  [  5.]]

 [[195.]
  [195.]
  [195.]
  ...
  [  5.]
  [  5.]
  [  5.]]

 ...

 [[176.]
  [176.]
  [176.]
  ...
  [  5.]
  [  5.]
  [  5.]]

 [[ 23.]
  [176.]
  [176.]
  ...
  [  5.]
  [  5.]
  [  5.]]

 [[175.]
  [176.]
  [176.]
  ...
  [  5.]
  [  5.]
  [  4.]]]

The predicted output:

The true output:

I can see that it needs more training but at least something is matching. Shall I use the color map to render my predicted output? Thanks.

ZihengSun commented 5 years ago

@sankalpmittal1911-BitSian yeah, it is better for observing if rendered using colors. The distribution shape starts to match but the values seem wrong.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun Distribution shapes starts to match because I guess that set up of the model is actually making sense as far as the segmentation goes. Values i.e. pixel values predicted are wrong because accuracy is very less.

I tried training model for 10 epochs more and accuracy was literally hovering around 53% and is not improving. I have to increase accuracy to get the values right. What should I do to decrease loss further in the above model? I tried changing layers, optimizers, learning rate... Shall I add more layers?

Also I will post the color rendered image output. Thanks.

ZihengSun commented 5 years ago

@sankalpmittal1911-BitSian I found that the way you get the results is a little weird to me.

for i in range(128):
  for j in range(128):
    for l in range(255):
      if(ans[0][i][j][l]>maxim):
        maxim=ans[0][i][j][l]
        index=l
    ans_final[i][j][0]=index
    index=-1
    maxim=-1

Is this the implementation of softmax? What is your code for compiling and training the model?

1998at commented 5 years ago

@ZihengSun I managed to get training accuracy of 82 but on the validation set I cannot go beyond 70% accuracy.I tried various Learning Rates and DropOut and Shifted from UNet to FCN.Should I try Ensembling.

sankalpmittal1911-BitSian commented 5 years ago

@ZihengSun No sir, this is the conversion of (128,128,255) into (128,128,1) after prediction by taking maximum along 255 classes. Alternatively I could have employed numpy.argmax(axis=3). Here is my code for compiling and training model:

input_img = Input((im_height, im_width, 7), name='img')
model = get_unet(input_img, n_filters=512, dropout=0.15, batchnorm=True)
#model = get_unet()

model.compile(optimizer=Adam(lr=0.01), loss="categorical_crossentropy", metrics=["accuracy"])
model.summary()

Here is training:

callbacks = [
    EarlyStopping(patience=10, verbose=1),
    ReduceLROnPlateau(factor=0.1, patience=3, min_lr=0.00000001, verbose=1),
    ModelCheckpoint('model.h5', verbose=1, save_best_only=True, save_weights_only=False)
]

new_model=load_model('/content/model.h5')

results = new_model.fit_generator(get_data(train_ids, batch_size=2), steps_per_epoch=675 , epochs=30, verbose=1 , callbacks=callbacks, validation_data = get_data(valid_ids,batch_size=1) , validation_steps = 243)

I used model.fit_generator instead of model.fit because I was getting memory crash and thus I had to load data in batches.

I was getting validation accuracy almost same as training: 51% around.

Thank you.

Edited... Edit2: Can you also please check your mail? I have sent you a draft proposal.

1998at commented 5 years ago

@ZihengSun I needed to clarify a doubt.Can we Try Ensembling or are we restricted to A Single Model

sankalpmittal1911-BitSian commented 5 years ago

@at1998 Did the conversion of FCN from UNet made any difference? Also which type of ensembling technique are you referring?

1998at commented 5 years ago

@sankalpmittal1911-BitSian I tried both the models and both are almost getting stuck at the same validation score.So In my case it didnt make any difference

ZihengSun commented 5 years ago

Ensembling is allowed if it is making progress on the validation accuracy.

ZihengSun commented 5 years ago

Training accuracy of 82 is very promising. Could you increase the ratio of training samples over testing samples? Normally for complex classification task on big data, the suggested ratio is 98:1:1. Since our sample dataset is small, I would say 90:5:5 might be good. I will add more samples to improve the feature diversity of the sample pool and reduce bias.

ZihengSun commented 5 years ago

@sankalpmittal1911-BitSian ~50% and no longer increasing is not a good sign after ~30 epochs. It must have some places wrongly configured which makes the network not learning.

1998at commented 5 years ago

@ZihengSun I am using 90% for Training and 10% for validation.Havent split out the test set yet.Should i try to train on 95% of the data.One thing that I have noticed is that averaging all the seven channels into 3 channels is producing better accuracy and converging the Model much Faster rather than passing all the seven channels as Separate inputs.It could be due to poor Augmentation.Also I have few things in mind that I want to write in the proposal That I think will increase the accuracy but I don't really know that for Sure,so as of now Can I Include them in My proposal?

sankalpmittal1911-BitSian commented 5 years ago

@at1998 Are you using (128,128,3) as input? Did you calculate average of 7 channels over 3 channels?

1998at commented 5 years ago

@ZihengSun In the AgNet Dataset We know that the 7 channels are 7 bands from LandSat8. Could we get an information on which band corresponds to which color spectrum.I searched on the Web and most articles mention LandSat8 having 11 channels.Could you please help me out.

1998at commented 5 years ago

@sankalpmittal1911-BitSian I tried (128 128 3 ) as well as (128 128 7).IN my most current method I am giving the 7 channel Input Though

ZihengSun commented 5 years ago

The bands in sample data come from Landsat 8 OLI sensor. More details about the band and its spectral region could be found here. Band 8 ~ 11 are rarely used in land cover classification. The multi spectral bands will help increase the feature space than only RGB bands. Neural network is supposed to take advantage of the more spectral features to get higher accuracy.

ESIPFed / gsoc