MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
718 stars 102 forks source link

Why is the hyper-parameter tuning performed based on the score of train set instead of valid set? #80

Closed geliAI closed 1 year ago

silviatti commented 1 year ago

Copy-pasting my email reply for reference :)

In general, you use topic models to discover the topics of a given document collection. That means you don't usually have a train/test/val split setting, but just a "train" set. Indeed, even if you want to predict the topics of a new unseen document, the predicted topics remain fixed. For this reason, we decided to optimize the score on the train set. However, as you may have noticed, OCTIS can consider the datasets also as train/test or train/val/test splits. The validation split is used only for those models that use an early stopping criterion to stop the training process. While the test split is used only in the case classification metrics are considered.

I'm closing this issue.