Different result despite same input

TeamHG-Memex / sklearn-crfsuite

scikit-learn inspired API for CRFsuite

426 stars 215 forks source link

Different result despite same input #9

Closed iamhuy closed 7 years ago

iamhuy commented 7 years ago

I tried to create some CRF instances to train with the same training set and same max_iteration param.

crf = sklearn_crfsuite.CRF(
            algorithm='ap', 
            max_iterations=5, 
        )
crf.fit(X_train, Y_train)

t = sklearn_crfsuite.CRF(
            algorithm='ap', 
            max_iterations=5, 
        )
t.fit(X_train, Y_train)

However, their result is different ( I tested them on the same develop set with fmeasure). Hope to see your response soon. Thank you

kmike commented 7 years ago

I think this is expected - crfsuite shuffles dataset for Averaged Perceptron training, and uses a global random seed (see here); it means shuffle returns a different result each time.

iamhuy commented 7 years ago

Thank you ! Does it mean for a specific set of hyperparameters, it is necessary to train more than one time to find the best one ( because it depends on the time too) ?

kmike commented 7 years ago

Well, it depends on a goal. If you want to compare hyperparameters then yeah, it could make sense to train on several seeds, and take e.g. an average, or a best model, or just compute variance. But are results really that different in different runs?

iamhuy commented 7 years ago

No. They're not different on different runs: I mean if I run above code with in 2 different execitions that crf1, t1 ,crf2, t2: then crf1 = crf2 , t1 = t2 and crf1 != t1

severinsimmler commented 4 years ago

How do I set the random seed?

huang-xx commented 3 years ago

@iamhuy @severinsimmler Hi, I encountered the same problem, but after I setrandom_state for the train_test_split function of sklearn.model_selection, the results became consistent.

UAmsterdam commented 9 months ago

I am also getting different results while running it on different environment: Like command-line version of CRFsuite and Python version of CRFsuite.

Does anyone here has some idea whats going on?