Closed bgr33r closed 7 years ago
Update: The above problem was unrelated to loss_shape. I figured out through some trial and error that the format of the segmentation data must have a color index between 1-256 instead of RGB. Just updating this here in case someone is putting their own training data together. Though, it would be useful if there was a better descriptor of how this code uses loss_shape/how the loss function is working in here. Thanks!
Could you elaborate on exactly the formats you're discussing here?
Sure. I created training data using RGB .jpg format images that I downloaded from the web. I took the raw .jpg files, and using GIMP, created .jpg segmented images that were RGB format. I converted these to .png to match the format that the code was written to utilize. I then banged my head on my desk for a while until I thought to examine the image properties of the VOC2012 segmentation data and compare it to my own data. I found that while my segmentation data was RGB, the VOC segmentation data was Indexed color (256 colors). I wrote a short python script that read in the jpgs and converted to 256 colors .png files (below). I am currently training and on epoch 59/250. Not sure about success yet - but there it is. I do have concerns that when I converted from RGB to 256 colors I accidentally matched a color in the VOC data because I picked colors using RGB - I will be checking that next.
WD = "~/SegmentationObject/" os.chdir(WD) files = glob.glob ('*.jpg') for infile in files: file, ext = os.path.splitext(infile) im = Image.open(infile).convert('P',colors=256) im.save(file + ".png")
Oh yeah that's a tricky one! For others reading, the png values are actually single channel ids, which are reused in Pascal VOC as class ids. What you see when you look at a png files are the values resulting from a lookup into what's known as a color palette
which is a map from id to a color.
jpg is a bit risky to use for segmentation data because it may change colors depending on compression settings, and thus identify a different class. I think python pillow can be used for format conversion and creating the palette.
Yup! And, somewhat ironically, the .jpgs files I initially created were maximum quality and histograms of the colors showed my 3 distinct classes perfectly - but converting from .jpg to .png using PIL introduced new colors (for example, band 84 was cut into bands 83 and 84). As a work around, I managed to use quantize in PIL to convert my .jpgs to .pngs without loss (i.e. new colors) - but I have very little control over the class IDs and for palette I can only use an "adaptive" or "web" palette (see PIL readme). One major problem I also had was that two of my color classes were different enough in RGB that quantize converted my green and purple classes differently. I don't have any examples I can share, but with my green colored class, quantize combined the green colors with the VOID outline from VOC, but with my purple class, quantize kept the classes separate. In the end, I went back and manually recolored my images using the color select and bucket tools in GIMP. Another option would have been to convert into numpy format - but I wasn't sure how the COCO data was handled in the main code, and I didn't want to open a new can of worms.
Below is my new code I used to convert .jpg to .png:
`#convert to 3 color palette index
from PIL import Image import glob, os WD = "~/SegmentationObject/" os.chdir(WD) files = glob.glob ('*.jpg') for infile in files: file, ext = os.path.splitext(infile) im = Image.open(infile).quantize(colors=3) im.save(file + ".png")`
Wow, thanks for the detailed explanation! I'm sure that will help someone out in the future.
@bgr33r Thank you very much for sharing you knowledge, I met this problem when I use this code for training 2 classes images.
Hello,
Thank you for providing your code. I have been able to successfully training and evaluate data using the VOC2011 and VOC 2012 datasets with the model "AtrousFCN_Resnet50_16s".
I am working on training using my own data and the model "AtrousFCN_Resnet50_16s". I have a series of rgb .jpgs which I have segmented similarly to the VOC2011 and VOC2012 datasets. I am training for 3 classes (G1, G2, and no data) which are represented by 3 colors (purple, green, black), the first of two do not match the colors of the VOC models and are new categories. I should add that following the VOC data, I've also included a VOID category that is an outline around my objects of interest.
I have generated code to match the format of the VOC2012 dataset call in train.py. In this duplication, I have also changed the number of classes to 3:
if dataset == 'Grapes': train_file_path = os.path.expanduser('~/path/to/my/data/train.txt') val_file_path = os.path.expanduser('~/path/to/my/data/val.txt') data_dir = os.path.expanduser('~/path/to/my/data/JPEGImages') label_dir = os.path.expanduser('~/path/to/my/data/SegmentationObject') data_suffix='.jpg' label_suffix='.jpg' classes = 3
When I run my code, I get an error in the SegDataGenerator which reads: ValueError: could not broadcast input array from shape (350,350,3) into shape (350,350,1)
I've tracked back and the error occurs here in the code:
I noticed that in your COCO implementation, you added loss_shape and I added "loss_shape = (target_size[0] target_size[1] classes,)" to the configuration, but now I get a new error:
ValueError: Cannot feed value of shape (1, 367500) for Tensor 'bilinear_up_sampling2d_1_target:0', which has shape '(?, ?, ?, ?)'
Do you have any suggestions for what I need to do so that I can train on my own data? I might add that my input data has fewer rows and columns than the VOC data by about 50%.
Thank you for your response.