Open JasonSills opened 3 years ago
For example, if you had a "baby boomers" group before you do still have something that approximates that group?
You should find that some groups are still clearly the same, and others disappeared or were merged.
I am confused about what we are supposed to compare. After clustering I got 3 groups with all of the variables broken down like so:
I then used the code in the lab to produce graphs of the indices.
I then used the lab codes to produce finite mixture models for the three indices and the three census variables.
Then there are these three cluster maps which I am having a hard time producing in my own .rmd file. I have another issue opened for that. This picture is from the lab instructions.
An finally there are the plots comparing the indices to each other and the census variables to each other.
Which of these are we supposed to compare for question 1 and how? Did I miss a step?
I realized I also have a model of the original three cluster groups. Are we supposed to compare the results from the finite mixture models? I have no idea how to do that.
No, the comparison should be between the types of groups you identified with the clustering.
After applying labels to each cluster, which groups were present in both clusters? Which disappeared?
You don't have to learn how to interpret different output. It's just an extension of Lab-03 where you created cluster labels.
The take-away is, if you change the input data, do you change the clusters?
So for question one I am just explaining what variables are present or not present among the cluster groups? For example, a lot of female-headed households in one group compared to the others?
Yes, exactly. Did that group remain or disappear?
And visually, did certain areas in the city continue to form a strong cluster even if the label changed?
I think you will find the algorithm is sensitive to the data inputs.
did certain areas in the city continue to form a strong cluster even if the label changed?
I am not sure what you mean by label changes.
On this example, I would highlight how the cluster on the left remained intact, but when more data was used the algorithm was able to distinguish two groups (blue cluster on top, red cluster in the middle).
The cluster on the right might have the label of "white and retired" in both cases (just making it up here), so the cluster remained intact.
How did the clusters change as a result of the data inputs changing? Number, and composition of the groups?
Oh okay. I need to compare the maps of cluster, cluster2 and cluster3 and explain how those groupings changed with different inputs.
Correct. You will use the tables of variables as well to make sense of the clusters.
But yes - it's comparing the consistency of groups identified.
And to clarify for question 2, I need to select my own 3 variables (different from the ones you used in the lab instructions) and compare the original clusters (cluster) and the clusters produced with those three variables (cluster3)?
Do I need to create new tables for cluster2 and cluster3? I only have tables for the original cluster.
Yes, you need new tables for the new clusters - which should be smaller since there are fewer variables so easier to interpret.
HI @lecy,
I'm trying to clarify clustering conceptually.
In the question: Compare that set of groups to the groups identified by the model using only the three indices above. Are they identifying the same groups? Which group is missing?
In my city, Seattle, the 30 variable model generated 5 groups. With the 3 indicies it is generating 3 groups. But all of these groups are fundamentally different from the 5 in the 30 variable model. They have to be, it's a bit like removing variables from a regression; your model changes. So I'm not sure I understand "which group is missing". I see that groups 4 and 5 are missing, but they are all different so it seems that they are all "missing". How should I answer this?