DS4PS / cpp-529-fall-2020

http://ds4ps.org/cpp-529-fall-2020/
0 stars 0 forks source link

Clarifying question #13

Open JasonSills opened 3 years ago

JasonSills commented 3 years ago

HI @lecy,

I'm trying to clarify clustering conceptually.

In the question: Compare that set of groups to the groups identified by the model using only the three indices above. Are they identifying the same groups? Which group is missing?

In my city, Seattle, the 30 variable model generated 5 groups. With the 3 indicies it is generating 3 groups. But all of these groups are fundamentally different from the 5 in the 30 variable model. They have to be, it's a bit like removing variables from a regression; your model changes. So I'm not sure I understand "which group is missing". I see that groups 4 and 5 are missing, but they are all different so it seems that they are all "missing". How should I answer this?

lecy commented 3 years ago

For example, if you had a "baby boomers" group before you do still have something that approximates that group?

You should find that some groups are still clearly the same, and others disappeared or were merged.

ekmcintyre commented 3 years ago

I am confused about what we are supposed to compare. After clustering I got 3 groups with all of the variables broken down like so: image

I then used the code in the lab to produce graphs of the indices.

image

I then used the lab codes to produce finite mixture models for the three indices and the three census variables.

image

Then there are these three cluster maps which I am having a hard time producing in my own .rmd file. I have another issue opened for that. This picture is from the lab instructions.

image

An finally there are the plots comparing the indices to each other and the census variables to each other.

image

Which of these are we supposed to compare for question 1 and how? Did I miss a step?

ekmcintyre commented 3 years ago

I realized I also have a model of the original three cluster groups. Are we supposed to compare the results from the finite mixture models? I have no idea how to do that.

image

lecy commented 3 years ago

No, the comparison should be between the types of groups you identified with the clustering.

After applying labels to each cluster, which groups were present in both clusters? Which disappeared?

You don't have to learn how to interpret different output. It's just an extension of Lab-03 where you created cluster labels.

The take-away is, if you change the input data, do you change the clusters?

ekmcintyre commented 3 years ago

So for question one I am just explaining what variables are present or not present among the cluster groups? For example, a lot of female-headed households in one group compared to the others?

lecy commented 3 years ago

Yes, exactly. Did that group remain or disappear?

And visually, did certain areas in the city continue to form a strong cluster even if the label changed?

I think you will find the algorithm is sensitive to the data inputs.

ekmcintyre commented 3 years ago

did certain areas in the city continue to form a strong cluster even if the label changed?

I am not sure what you mean by label changes.

lecy commented 3 years ago

On this example, I would highlight how the cluster on the left remained intact, but when more data was used the algorithm was able to distinguish two groups (blue cluster on top, red cluster in the middle).

The cluster on the right might have the label of "white and retired" in both cases (just making it up here), so the cluster remained intact.

image

How did the clusters change as a result of the data inputs changing? Number, and composition of the groups?

ekmcintyre commented 3 years ago

Oh okay. I need to compare the maps of cluster, cluster2 and cluster3 and explain how those groupings changed with different inputs.

lecy commented 3 years ago

Correct. You will use the tables of variables as well to make sense of the clusters.

But yes - it's comparing the consistency of groups identified.

ekmcintyre commented 3 years ago

And to clarify for question 2, I need to select my own 3 variables (different from the ones you used in the lab instructions) and compare the original clusters (cluster) and the clusters produced with those three variables (cluster3)?

Do I need to create new tables for cluster2 and cluster3? I only have tables for the original cluster.

lecy commented 3 years ago

Yes, you need new tables for the new clusters - which should be smaller since there are fewer variables so easier to interpret.