Error doesn't decrease - Githubissues

freesouls / face-alignment-at-3000fps

The project is an C++ implementation of Face Alignment at 3000fps via Regressing Local Binary Features

283 stars 145 forks source link

Error doesn't decrease #4

Open shrubb opened 8 years ago

shrubb commented 8 years ago

Hi,

First of all, thanks for this implementation. I ran and trained successfully the previous version (before refactoring), but now I have troubles with training. As before, I followed the instructions in README, downloaded HELEN dataset, put it in example/helen. The reported error rate is:

training stage: 0 of 6
train regression error: 1071.14, mean error: 0.149516
Validation at stage: 0
Validation error: 37.3754, mean error: 0.120178

training stage: 1 of 6
train regression error: 847.716, mean error: 0.11833
Validation at stage: 1
Validation error: 32.5232, mean error: 0.104576

training stage: 2 of 6
train regression error: 735.792, mean error: 0.102707
Validation at stage: 2
Validation error: 30.7561, mean error: 0.0988942

training stage: 3 of 6
train regression error: 803.279, mean error: 0.112127
Validation at stage: 3
Validation error: 36.0491, mean error: 0.115914

training stage: 4 of 6
train regression error: 752.986, mean error: 0.105107
Validation at stage: 4
Validation error: 35.4158, mean error: 0.113877

training stage: 5 of 6
train regression error: 791.054, mean error: 0.110421
Validation at stage: 5
Validation error: 38.1961, mean error: 0.122817

finish training, start to saving the model...
model name: helenModel
save the model successfully

Obviously, the quality is then rather poor:

landmarks

Maybe I'm doing something wrong?

freesouls commented 8 years ago

please wait, let me verify this!

shrubb commented 8 years ago

@freesouls Okay, thank you!

freesouls commented 8 years ago

Hi, I just clone the codes to a new folder, and run the program(without changing anything), the output is below, I just run 3 stages, it seems the error is decreasing (thought different from the output in README.md, because initial_guess is randomly sampled, positions of each pixel difference features are also randomly sampled, the threshold when spliting nodes in random forest is also randomly selected when you run the program)

training stage: 0 of 6
train regression error: 897.343, mean error: 0.125257
Validation at stage: 0
Validation error: 31.7507, mean error: 0.102092

training stage: 1 of 6
train regression error: 629.442, mean error: 0.0878617
Validation at stage: 1
Validation error: 25.116, mean error: 0.0807589

training stage: 2 of 6
train regression error: 493.326, mean error: 0.0688618
Validation at stage: 2
Validation error: 22.7215, mean error: 0.0730595

will you rerun the codes?

shrubb commented 8 years ago

Hm, let me try this on a different machine in a moment. That run was on a 24-core machine, maybe this caused the problem.

freesouls commented 8 years ago

My machine is 8-core(the program is running in 8 threads at the same time), but the codes should be OK with 24 threads. Off course, may be there are something wrong with the codes.

freesouls commented 8 years ago

Hi, before running, set params.local_features_num_ = 400 or 500, 200 may be too small, and it won't effect the running time, for linear regression takes about 80% training time

shrubb commented 8 years ago

Thank you for helping. I've set local features number to 500. That's very strange: I got it working on the desktop machine, but no effect on the remote one...

For some reason, during training, it reported that it found 1798 faces on desktop, but 1791 on remote! The datasets are bit-for-bit identical.

I'll continue investigating this!

freesouls commented 8 years ago

sometimes, for the same image, opencv's(even using the same version) face detector will give different results.

xiaoxiongli commented 7 years ago

Dear freesouls:

i try to train a model using your default settings, the training loss seems big(comparing your training log):

train regression error: 575.186, mean error: 0.0799759 Validation at stage: 5 Validation error: 28.2551, mean error: 0.0905614

and after the model is generated, i test one picture, it seems the result is not good:

the whole training log:

xiaoxiongli commented 7 years ago

Dear freesouls:

i think that there maybe some bug related with openmp or multi-thread, because i disable the openmp in the Cmakelist.txt, things get better......., by the way, i use a 24-core CPU PC.

xiaoxiongli commented 7 years ago

do not use multi-thread(openmp):

Global Regression of stage 5 it will take some time to do Linear Regression, please be patient!!! regressing ...0 regressing ...8 regressing ...16 regressing ...24 regressing ...32 regressing ...40 regressing ...48 regressing ...56 regressing ...64 predict regression targets update current shapes train regression error: 177.771, mean error: 0.0247179 Validation at stage: 5 Validation error: 17.5888, mean error: 0.0563745 finish training, start to saving the model... model name: helenModel save the model successfully

default