edmundlth / MBUSA-ML-2022

Material for teaching Machine Learning for Business Analytics (Melbourne Business School, 2022)
Apache License 2.0
0 stars 2 forks source link

Practice Exam 2022 discussion | MBUSA Machine Learning Module 2022 #15

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Practice Exam 2022 discussion | MBUSA Machine Learning Module 2022

link

https://edmundlth.github.io/MBUSA-ML-2022/extra/machine-learning/2022/09/21/practice-exam-2022-discussion.html

JasonQuanbin commented 1 year ago

I can't find a topic for week 7 so I guess I can ask here.

Week 7 semi-supervised learning says that the COP K-means clustering is sensitive to order of instances. Can you elaborate on this?

edmundlth commented 1 year ago

Hey Jason,

So, imagine you are in the "assigning points to clusters for a given centroid" phase in k-means, but you're processing a point X that features in the set of constraints. Let's say one of the constraints says that X must be clustered together with Y. That means which cluster that points belongs to might depend on whether Y has been processed. If it hasn't, then X goes to the cluster with the nearest centroid. If it has, then X goes to the cluster where Y is, which may or may not be the cluster with the nearest centroid. And in this later case, where Y later belong, depends on where X goes too. This is what we meant by COP k-means being sensitive to instance ordering.

I think it would be enlightening to do the example in the practice session with the initial centroid being at the tail end (so, 9 and 10) and see that you would get a different clustering. That same phenomenon will occur if the data instances are not process in the same order.