Cysu / open-reid

Open source person re-identification library in python
https://cysu.github.io/open-reid/
MIT License
1.34k stars 349 forks source link

"LinAlgError: Matrix is not positive definite" when training KISSME. #10

Closed dongb5 closed 7 years ago

dongb5 commented 7 years ago

Hi! "LinAlgError: Matrix is not positive definite" occurred when training when set 'dist-metric' to 'kissme'. Any help? Thanks!

Cysu commented 7 years ago

Could you please attach the full error messages and tell me the numpy version? We should have handled this condition at here.

dongb5 commented 7 years ago
 Test with best model:
=> Loaded checkpoint '/home/lz/logs/model_best.pth.tar'
Extract Features: [1/13]        Time 0.755 (0.755)      Data 0.618 (0.618)      
Extract Features: [2/13]        Time 0.136 (0.445)      Data 0.001 (0.309)      
Extract Features: [3/13]        Time 0.138 (0.343)      Data 0.000 (0.206)      
Extract Features: [4/13]        Time 0.136 (0.291)      Data 0.000 (0.155)      
Extract Features: [5/13]        Time 0.138 (0.261)      Data 0.000 (0.124)      
Extract Features: [6/13]        Time 0.137 (0.240)      Data 0.000 (0.103)      
Extract Features: [7/13]        Time 0.137 (0.225)      Data 0.000 (0.089)      
Extract Features: [8/13]        Time 0.137 (0.214)      Data 0.000 (0.078)      
Extract Features: [9/13]        Time 0.138 (0.206)      Data 0.000 (0.069)      
Extract Features: [10/13]       Time 0.135 (0.199)      Data 0.000 (0.062)      
Extract Features: [11/13]       Time 0.138 (0.193)      Data 0.000 (0.056)      
Extract Features: [12/13]       Time 0.136 (0.188)      Data 0.000 (0.052)      
Extract Features: [13/13]       Time 0.137 (0.184)      Data 0.000 (0.048)      
Extract Features: [1/20]        Time 0.357 (0.357)      Data 0.222 (0.222)      
Extract Features: [2/20]        Time 0.137 (0.247)      Data 0.001 (0.111)      
Extract Features: [3/20]        Time 0.138 (0.211)      Data 0.000 (0.074)      
Extract Features: [4/20]        Time 0.137 (0.192)      Data 0.000 (0.056)      
Extract Features: [5/20]        Time 0.139 (0.182)      Data 0.000 (0.045)      
Extract Features: [6/20]        Time 0.136 (0.174)      Data 0.000 (0.037)      
Extract Features: [7/20]        Time 0.139 (0.169)      Data 0.000 (0.032)      
Extract Features: [8/20]        Time 0.137 (0.165)      Data 0.000 (0.028)      
Extract Features: [9/20]        Time 0.137 (0.162)      Data 0.000 (0.025)      
Extract Features: [10/20]       Time 0.137 (0.159)      Data 0.000 (0.023)      
Extract Features: [11/20]       Time 0.139 (0.157)      Data 0.000 (0.020)      
Extract Features: [12/20]       Time 0.137 (0.156)      Data 0.000 (0.019)      
Extract Features: [13/20]       Time 0.138 (0.154)      Data 0.000 (0.017)      
Extract Features: [14/20]       Time 0.136 (0.153)      Data 0.000 (0.016)      
Extract Features: [15/20]       Time 0.139 (0.152)      Data 0.000 (0.015)      
Extract Features: [16/20]       Time 0.137 (0.151)      Data 0.000 (0.014)      
Extract Features: [17/20]       Time 0.137 (0.150)      Data 0.000 (0.013)      
Extract Features: [18/20]       Time 0.136 (0.150)      Data 0.000 (0.013)      
Extract Features: [19/20]       Time 0.138 (0.149)      Data 0.000 (0.012)      
Extract Features: [20/20]       Time 0.582 (0.171)      Data 0.000 (0.011)      
Traceback (most recent call last):

  File "", line 1, in 
    runfile('/home/lz/toolbox/open-reid/examples/softmax_loss.py', wdir='/home/lz/toolbox/open-reid/examples')

  File "/home/lz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "/home/lz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/lz/toolbox/open-reid/examples/softmax_loss.py", line 217, in 
    main(parser.parse_args())

  File "/home/lz/toolbox/open-reid/examples/softmax_loss.py", line 169, in main
    evaluator.evaluate(test_loader, dataset.query, dataset.gallery, metric)

  File "/home/lz/anaconda3/lib/python3.6/site-packages/open_reid-0.2.0-py3.6.egg/reid/evaluators.py", line 119, in evaluate
    distmat = pairwise_distance(features, query, gallery, metric=metric)

  File "/home/lz/anaconda3/lib/python3.6/site-packages/open_reid-0.2.0-py3.6.egg/reid/evaluators.py", line 60, in pairwise_distance
    x = metric.transform(x)

  File "/home/lz/anaconda3/lib/python3.6/site-packages/open_reid-0.2.0-py3.6.egg/reid/dist_metric.py", line 25, in transform
    X = self.metric.transform(X)

  File "/home/lz/anaconda3/lib/python3.6/site-packages/metric_learn-0.3.0-py3.6.egg/metric_learn/base_metric.py", line 46, in transform
    L = self.transformer()

  File "/home/lz/anaconda3/lib/python3.6/site-packages/metric_learn-0.3.0-py3.6.egg/metric_learn/base_metric.py", line 29, in transformer
    return inv(cholesky(self.metric()))

  File "/home/lz/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py", line 612, in cholesky
    r = gufunc(a, signature=signature, extobj=extobj)

  File "/home/lz/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py", line 93, in _raise_linalgerror_nonposdef
    raise LinAlgError("Matrix is not positive definite")

LinAlgError: Matrix is not positive definite

Numpy version: 1.12.1

Cysu commented 7 years ago

Have you modified the code? Did you have metric.train(model, train_loader) before the evaluation? Just like this line.

dongb5 commented 7 years ago

Yes I have that line of code. I did not modify the code. The error occurred when training kissme on viper, but everything was fine on market1501.

Cysu commented 7 years ago

It was caused by numerical unstability. We have changed to an iterative algorithm to make the dist-metric matrix positive definite. See our recent commit.