Closed jsherrah closed 11 years ago
Here are results from some recent papers. Note that papers tend to quote two error rates, global (average over all pixels) and average (averaged per-class error rates). If you just guessed "grass" you would probably do quite well on global. Top 4 per column are highlighted.
Paper | Global | Average | Description |
---|---|---|---|
Shotton 2006 | 72.2 | 57.7 | TextonBoost |
Pantofaru 2008 | 74.3 | 60.3 | combine multiple segmentations |
Shotton 2008 | 72.0 | 67.0 | Semantic texton forests |
Gould 2008 | 76.5 | 64.3 | Relative location prior (avg over random partitions) |
Kluckner 2009 | 68.6 | 59.6 | Super-pixel, sigma-points |
Yao 2012 | 84.4 | 77.4 | joint detection, classn and segmentation |
Liu 2012 | 75.0 | 68.0 | multi-scale superpixels |
Ladicky 2013 | 86.8 | 77.8 | co-occurrence stats |
Souiai 2013 | 86.0 | 79.0 | Co-occurrence Prior |
Rubinstein 2013 | 87.7 | ? | Unsupervised co-segmentation |
Wang 2013 | 83.7 | ? | sparse coding super-pixel features |
Lucchi 2013 | 83.7 | 78.9 | structured prediction and working sets |
Recommendation: read through Yao, Ladicky Souiai and Rubinstein and see what features and techniques they are using. Co-occurrence stats seem to be useful and we are already using it to some extent, on the right track.
Rubintein is a different application: cosegmentation of foreground objects from background. It's not a candidate.
Yao seems a bit hard-wired to the data set, learning object detectors and shape priors. Also they use another segmentation first that half does the job.
Consider Wang and Lucchi too.
Souai compare to Ladicky 2013 and get about the same results. They use the same features (data term) as Ladicky. They add a class co-occurrence penalty term (independent of location) to the potential. They use a continuous convex optimisation approach at the pixel level. It's hard to know whether the new inference method or the co-occurrences are contributing more to the accuracy. Maybe Ladicky would be a better source.
Lucchi are using SVMs to "learn CRFs", need to read more to find out what that means. It looks like they are learning the unary potential probs but I think there is more to it than that. They use super pixels, BOW sift and colour hist codebooks, and hierarchical potentials as in Ladicky 2009.
Ladicky, Torr et al "Inference Methods for CRFs with Co-occurrence Statistics" develops a co-occurrence potential, C(L), in the image label CRF. They show that MAP inference over the resultant model can be achieved via: 1) Reformulating as integer program, solved using LP-relaxation; 2) Reparameterising the model into pairwise energy potentials using an auxiliary variable, solved using belief propagation; 3) Solved efficiently using Boykov's approx. graph cut alpha-expansion or alphaBeta-swap moves.
Interestingly, different forms for C(L) were used for MSRC data and VOC data. The C(L) for MSRC is defined in Eqn(56) and Eqn(57).
Experimental details are light, but four configurations were evaluated: SegmentCRF vs SegmentCRF & C(L) HierarchicalCRF vs HierarchicalCRF & C(L)
SegmentCRF and HierarchicalCRF models are defined in ["Associative Hierrarichal CRFs for Object Class Image Segmentation", Ladicky et al, 2009] paper.
The paper states that the 2010 inference method described in ["Exact and Approximate Inference in Associative Hierarchical Networks using Graph Cuts", Russell et al , 2010] was used.
Inclusion of C(L) improved global performance for both the segmentCRF and hierachicalCRF models.
Inclusion of the C(L) co-occurrence cost does improve performance, and we can use our existing inference model to generate the MAP assignment. _It would be interesting to try and replicate their performance, once we have the baseline CRF in place._
I'm reading up on AHRF at the moment. In the co-occurrence paper they also mention they used the inference from [Russell 2010].
could look at recent cvpr.