snclassifier_test.py throwing tolerance error stochastically

tallamjr commented 5 years ago

Every now and again when running tests, snclassifier_test.py throws the following error:

    def classification_test(cls, featz, types):
        out_dir=os.path.join('classifications', '')
        if not os.path.exists(out_dir):
            subprocess.call(['mkdir',out_dir])

        snclassifier.run_pipeline(featz, types, classifiers=cls, nprocesses=4, plot_roc_curve=False, output_name=out_dir)

        auc_truth={'nb':5.498296484233418102e-01, 'svm': 9.607832585029829620e-01, 'knn':8.683540372670807139e-01, 'random_forest': 9.794267790146994335e-01, 'decision_tree':9.046528076757488490e-01, 'boost_dt': 9.597607478934744307e-01, 'boost_rf': 9.791576972753551766e-01, 'neural_network': 9.637969739836398375e-01}

        for classifier in cls:
            auc=np.loadtxt(os.path.join('classifications', classifier+'.auc'))
>           np.testing.assert_allclose(auc, auc_truth[classifier], rtol=0.25)
E           AssertionError: 
E           Not equal to tolerance rtol=0.25, atol=0
E           
E           (mismatch 100.0%)
E            x: array(0.7991556359976535)
E            y: array(0.5498296484233418)

One approach could be to simply increase the tolerance from 0.25 to say 0.5 but this could still lead to problems later.
Another angle could be to set a SEED parameter in snclassifier.py where there exists functions such as:
```
...
        objs=np.random.permutation(objs)
...
```
..and
```
...
        inds=np.random.permutation(range(len(features)))
...
```
What are peoples thoughts on this? @rbiswas4 , @Catarina-Alves , @MichelleLochner

rbiswas4 commented 5 years ago

The test sample should be small and therefore statistical fluctuations will happen:

Changing the random permutation through a seed helps!
But I would expect the classification algorithm itself has some randomness to it. (Unless we are using seeds there as well).

It seems like testing the value of AUC is a strange thing to test for ... we should be testing more basic functions, but also checking to make sure that the AUC calculation runs through. So, my guess would be do both your suggestions.

tallamjr commented 5 years ago

Closed in #118

LSSTDESC / snmachine

snclassifier_test.py throwing tolerance error stochastically #115