facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.76k stars 4.71k forks source link

Replicating Results #1336

Open MHDBST opened 1 year ago

MHDBST commented 1 year ago

I'm working on a classification task with fastText library, and I am trying to replicate the same results over different runs. I have set the following parameters and the seed is set to 40, but different runs result in different accuracies over dev set. The difference is significant in a way than in one run the accuracy is 90%, while in the other it is 75%. I'm not sure whether it's because of running on CPU and using multi thread functionality or there is any other way to replicate the results. Any guide on this?

fasttext.train_supervised(input=train_path, minCount=3, wordNgrams=4, minn=1, maxn=6, lr=0.001, dim=300, epoch=50, seed=40)

SDAravind commented 1 year ago

Yes, even I have the same issue, do you use autotune using the validation file parameter?

FYI - There's no seed parameter fasttext parameter

MHDBST commented 1 year ago

@SDAravind maybe its not mentioned in the wiki page for some reason, but this parameter is defined. https://github.com/facebookresearch/fastText/blob/440f46ac8811db0ce7ecb7dfb04f694453187db3/python/fasttext_module/fasttext/FastText.py#L522

SDAravind commented 11 months ago

@MHDBST - It resulted in error for me when I set the seed parameter.

As an alternate approach, I would use fasttext generate sentence vector method for text vectorisation along with scikit-learn MLPClassifier or any other estimator for consistent results (set random state to some value of your choice).