RockStarCoders / alienMarkovNetworks

Using MRFs and CRFs for computer vision problems.
21 stars 9 forks source link

investigate discrepancy between training and validation set errors #31

Closed jsherrah closed 10 years ago

jsherrah commented 10 years ago

on grid search it's 0.3, which is quite large. Fundamentally the problem is the distributions of features for the two data sets are too dissimilar. This can be caused by the data sets being too different, or the features being calculated differently for the two sets.

Anthony could you please investigate this? Perhaps start by listing and eyeballing the images in the training and validation sets. (we did separate them by image, didn't we?)

jsherrah commented 10 years ago

There are other possibilities:

jsherrah commented 10 years ago

I have investigated, and my conclusion is that these features are not good enough. Here's the details. The results below are using HSV colour and textons.

   - average accuracy per class =  0.430049881425
      building: 0.666282
      grass: 0.926347
      tree: 0.710614
      cow: 0.635071
      sheep: 0.297214
      sky: 0.944312
      aeroplane: 0.070000
      water: 0.515385
      face: 0.624454
      car: 0.397980
      bicycle: 0.515000
      flower: 0.586275
      sign: 0.287709
      bird: 0.000000
      book: 0.489540
      chair: 0.000000
      road: 0.683119
      cat: 0.212598
      dog: 0.226244
      body: 0.163539
      boat: 0.079365

Note bird and chair are 0! Is the data dodgy for these examples? Are there too few examples for the classes? Or are they just hard?

   - class proportions in Training set:
             building: 0.113521 (  7596 examples)
                grass: 0.189500 ( 12680 examples)
                 tree: 0.075202 (  5032 examples)
                  cow: 0.032654 (  2185 examples)
                sheep: 0.022880 (  1531 examples)
                  sky: 0.099562 (  6662 examples)
            aeroplane: 0.017276 (  1156 examples)
                water: 0.086172 (  5766 examples)
                 face: 0.019368 (  1296 examples)
                  car: 0.035853 (  2399 examples)
              bicycle: 0.026916 (  1801 examples)
               flower: 0.024704 (  1653 examples)
                 sign: 0.020982 (  1404 examples)
                 bird: 0.013674 (   915 examples)
                 book: 0.052411 (  3507 examples)
                chair: 0.018023 (  1206 examples)
                 road: 0.092882 (  6215 examples)
                  cat: 0.016335 (  1093 examples)
                  dog: 0.014287 (   956 examples)
                 body: 0.020579 (  1377 examples)
                 boat: 0.007218 (   483 examples)

Bird and chair are among the least represented classes. Still, there are many examples for bird and chair.

jsherrah commented 10 years ago

I don't have a concrete answer, but here is my hunch based on the above: