anilsathyan7 / Portrait-Segmentation

Real-time portrait segmentation for mobile devices
MIT License
638 stars 133 forks source link

An issue on training slim-net model #22

Closed rose-jinyang closed 3 years ago

rose-jinyang commented 3 years ago

Hello How are you? Thanks for contributing this project. I tried to train a model with slim-net on AISegment dataset but met the following issue.

2 226 229 232 235 236 234 236 240 242 243 252 236 199 163 199 225 200 133 148 189 190 221 222 236 237 238 239 240 241 241 239 240 241 242 242 242 241 242 242 241 240 239 238 238 235 237 232 238 246 234 211 183 158 143 173 189 180 215 220 241 239 239 247 241 237 238 239 241 241 240 239 240 238 237 237 237 235 234 234 234 233 232 230 229 228 227 228 228 228 227 226 225 223 223 223 223 223 223 226 227 229 231 232 229 228 219 223 195 196 211 217 207 195 199 206 199 183 172 169 198 210 199 180 182 186 206 201 200 230 233 236 239 196 154 131 120 144 174 165 167 167 168 169 171 175 175 178 177 176 176 177 179 180 180 182 183 185 186 190 192 199 226 208 208 200 186 193 197 199 199 199 198 199 206 205 202 200 200 201 204 205 206 210 212 213 214 215 218 218 220 222 224 226 228 230 230 231 231 231 232 232 231 235 236 235 232 228 226 229 231 231 230 228 228 225 226 227 228 230 231 233 230 230 228 228 228 230 230 225 226 229 229 228 227 225 223 225 176 151 141 138 134 140 137 139 144 148 150 152 164 169 174 176 180 185 201 202 206 210 215 220 222 223 223 224 225 226 227 227 228 228 228 228 228 228 228 228 228 228 229 230 231 232 230 228 228 228 228 228 [[{{node loss/conv2d_transpose_4_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

I think that this may be relative to label value range. I used the entire AISegment dataset. A mask image in AISegment dataset is a PNG format with 4 channels. How should I decode this mask image in dataloader? Should I load this mask image as grayscale? I found an strange part to load mask image in your slim512.ipynb. image Thanks

anilsathyan7 commented 3 years ago
  1. According to the code, both images and mask have three channels (see init of dataloader class).

  2. Now if you see the description in section ' Data-loader and Augmentations' in ipynb, it says: If your masks are not in raw format, then you need to convert them into sparse labels(color indexed) for training with SparseCategoricalCrossentropy loss (i.e 0 for bg and 1 for fg)

rose-jinyang commented 3 years ago

Hi Thanks for your reply. You used the pre-processed of AISegment in slim512.ipynb. image Could u share the dataset or its part? I want to know if your mask PNG file format is same as the original AISegment mask(matting) image. Thanks

anilsathyan7 commented 3 years ago

Off-course, its preproceesed before training and therefore it's different. Now it becomes a binary mask as i mentioned previously(0 or 1 pixel values only), in png format.

If you are concerend about decode_jpeg in the code, check out stackoverflow

rose-jinyang commented 3 years ago

What is the structure of each image in msk_uint8.npy.

image

This is your code that make a numpy file for mask image.

anilsathyan7 commented 3 years ago

Well, you can easily try the code on a folder with images and see for yourself. The final numpy array dataset is in format NHWC i.e N- number of images, H,W are height and width C i.e channels will be 1 in this code for mask images(as per keras requirements). In this case the loss function is differrent i.e BCE loss and we use sigmoid as last activation(see portrait_segmentation.ipynb). Also ,here mask values are 0 and 255.

So, in the aforementioned code each image would a mask(binary) with 1 channel dimension say 128x128x1, if input image is of size 128..

But we are not using this code in slim-net ; we directly use the image paths of preprocessed images in the data loader class , SparseCategoricalCrossentropy loss as loss function and mask values 0 or 1.

rose-jinyang commented 3 years ago

Thanks for your kind explanation

rose-jinyang commented 3 years ago

Hi Could u provide a script to convert the original AISegment mask image? Thanks

anilsathyan7 commented 3 years ago

Sorry, i think there should be a clarification .. Actually if you see the code, the mask images in the datasets should have 1 channel. The value 3 in the loader class is just a default value;but we override them when we call the data loader as shown below.

# Initialize the dataloader object
train_dataset = DataLoader(image_paths=train_image_paths,
                     mask_paths=train_mask_paths,
                     image_size=512,
                     crop_percent=0.8,
                     channels=[3, 1], # **here 1 refers to the mask channel**
                     seed=47)

So when you prepare alpha mask from aisegment dataset, single channel is sufficient.

Now, here is a rough idea for the preprocessing the aisegment dataset mask

import numpy as np
import cv2
import imageio

#  Save the rgb input image
image= cv2.imread('jpg image file path') // Original input image
image=image[...,0:3] // Only include RGB channels from original image
imageio.imsave('image_1.jpg',image)

# Save the alpha mask from matting masks
in_image = cv2.imread('png image file path', cv2.IMREAD_UNCHANGED) // matting mask of aisegment dataset
alpha = in_image[:,:,3] // Get alpa channel from 4 channel matting mask

// Convert to binary mask
alpha[alpha>=127]=1
alpha[alpha<127]=0

// Now save the binary mask with single channel
imageio.imsave('alpha_1.png',alpha)
rose-jinyang commented 3 years ago

Thank you very much.

rose-jinyang commented 3 years ago

Hi By your help, I started to train a new model with Slim-Net on AISegment dataset. But The training acc is 0.9688 and validation acc is 1.0 after the first epoch. After the second epoch, both the training acc and validation acc are 1.0. image

How should I understand this?

rose-jinyang commented 3 years ago

Hello By your help, I started to train a new model with Slim-Net on AISegment dataset. But The training acc is 0.9688 and validation acc is 1.0 after the first epoch. After the second epoch, both the training acc and validation acc are 1.0.How should I understand this?

On Monday, September 7, 2020, 09:12:14 PM GMT+8, anilsathyan <notifications@github.com> wrote:  

Sorry, i think there should be a clarification .. Actually if you see the code, the mask images in the datasets should have 1 channel. The value 3 in the loader class is just a default value;but we override them when we call the data loader as shown below.

Initialize the dataloader object

train_dataset = DataLoader(image_paths=train_image_paths, mask_paths=train_mask_paths, image_size=512, crop_percent=0.8, channels=[3, 1], # here 1 refers to the mask channel seed=47)

So when you prepare alpha mask from aisegment dataset, single channel is sufficient.

Now, here is a rough idea for the preprocessing the aisegment dataset mask import numpy as np import cv2 import imageio

Save the rgb input image

image= cv2.imread('jpg image file path') // Original input image image=image[...,0:3] // Only include RGB channels from original image imageio.imsave('image_1.jpg',image)

Save the alpha mask from matting masks

in_image = cv2.imread('png image file path', cv2.IMREAD_UNCHANGED) // matting mask of aisegment dataset alpha = in_image[:,:,3] // Get alpa channel from 4 channel matting mask

// Convert to binary mask alpha[alpha>=127]=1 alpha[alpha<127]=0

// Now save the binary mask with single channel imageio.imsave('alpha_1.png',alpha)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

anilsathyan7 commented 3 years ago

First, test the model and see if it's giving correct results. Then compare the hdf5 model with original model to see if it has the same structure Finally also check if masks are correctly preprocessed.

I'am not sure what exactly is the cause...

rose-jinyang commented 3 years ago

Thanks

rose-jinyang commented 3 years ago

Hi I found the following mistake in your slim512.ipynb.

image

Of course, I think that this is not a main reason.

anilsathyan7 commented 3 years ago

Here is a sample image and mask from the dataset. Check if there is any dissimilarity 1803151818-00000003.zip

rose-jinyang commented 3 years ago

Thanks

anilsathyan7 commented 3 years ago

It should not be an issue since the two dimesions are redundant. and in the preprocessing code we remove these channles. Anyway try with 3 channels and see if the problem persist.

anilsathyan7 commented 3 years ago

It shoudl not be an issue i guess, load as mask = cv2.imread('1803151818-00000003.png') ,save png mask in 513x513x3 and see if works

rose-jinyang commented 3 years ago

Will try

anilsathyan7 commented 3 years ago

If you have this exact image in your previous preprocessd dataset, just see if they are exactly same with the sample mask image i.e pixelwise eactly same or not?

rose-jinyang commented 3 years ago

Sure

rose-jinyang commented 3 years ago

Hi I found that there is an issue in my pre-processing code. image Your guess is correct. Thank you

rose-jinyang commented 3 years ago

Hi May I ask one more? Why did u use SparseCategoricalCrossentropy rather than binary_crossentropy in slim-net training?

anilsathyan7 commented 3 years ago

You can use both in this case, where there are only two classes.

See: https://amp.reddit.com/r/learnmachinelearning/comments/88g8zf/difference_between_binary_cross_entropy_and/

Also, sparse version just helps us to avoid creating a one hot version of labels when there are multiple classes.

rose-jinyang commented 3 years ago

Thanks Excuse me, have u ever implemented SINet for Portrait Segmentation? https://github.com/clovaai/ext_portrait_segmentation If so, I want to know the compared result with Slim-Net and SINet in accuracy. Thanks

anilsathyan7 commented 3 years ago

No, but I initially tried the eg1800 combined dataset from their repo...

After some experimentation it seems real world accuracy depends on having better dataset(clean and bigger), bigger model, bigger input size etc, especially becoz nowadays there are faster processors to deal with them.

Also sometimes it doesn't matter much in your specific use-case if there is 2 or 3 percentage difference in test set accuracy(say 95 vs 97), provided you attain required fps.

rose-jinyang commented 3 years ago

Thanks

anilsathyan7 commented 3 years ago

Anyway it's interesting in the research perspective...