Semi-Supervised Learning

Waikato / meka

Multi-label classifiers and evaluation procedures using the Weka machine learning framework.

http://waikato.github.io/meka/

GNU General Public License v3.0

199 stars 76 forks source link

Semi-Supervised Learning #75

Open bubbazz opened 2 years ago

bubbazz commented 2 years ago

Hi,

in the tutorial under the paragraph semi-supervised learning. is in the command an unlabed.arff.

I wonder how in the arff such a line looks. The only thing I found are "?" as attribute values.

For example: @RELATION unlabeded @ATTRIBUTE X1 NUMERIC @ATTRIBUTE X2 NUMERIC @ATTRIBUTE y0 {0, 1} @ATTRIBUTE y1 {0, 1}

{ 0 42.42, 1 42.42, 2 ?, 3 ? }

Is the above described unlabed? or what does such a dataset look like?

fracpete commented 2 years ago

The unlabeled dataset requires the exact same structure as the training set (ie same attribute and nominal label order) and the class attribute columns to contain only missing values (ie ?).

If you need to introduce missing values, have a look at the missing-values-imputation Weka package.

I've added a note to Tutorial.tex to make it clearer. Thanks for pointing it out!

bubbazz commented 2 years ago

Thanks for clearing it up. it helped me a lot.

bubbazz commented 2 years ago

Dear Meka-Team,

Is it possible to combine semi-supervised learning with hyperparameter tuning?
- because in the Tutorial.pdf the Semi-Supervised-Learning with EM/CM has two commands (see the first post) and i can't figure out how to built a pipe with hyperparameter tuning (e.g. meka.*.MultiSearch)
after training and testing (the two seperate commands), how do you predict unseen data.

Thank you very much indeed.

With kind regards

fracpete commented 2 years ago

From a quick look at the code:

MultiSearch isn't a semi-supervised algorithm itself (and therefore won't get the unlabeled dataset for training), so can't be used to optimize a semi-supervised classifier.
On the command-line, not sure. In code: meka.core.MLEvalUtils calculates threshold/thresholds using the collected prediction arrays (obtained from the classifier's distributionForInstance method for each row in the weka.core.Instance object) using the meka.core.ThresholdUtils class.

Please note, I don't use Meka, so only some vague pointers.