DS4PS / cpp-529-fall-2020

http://ds4ps.org/cpp-529-fall-2020/
0 stars 0 forks source link

Lab-03-Questions #7

Open gzbib opened 4 years ago

gzbib commented 4 years ago

Hello Sir @lecy

I am confused about lab 03 instructions, should we just go over the code chunks and test them? or should we explain the code chunks?

Also, I finished the lecture videos and Dr. Anthony mentioned that there is a video where he will be explaining lab 03 in it but I don't think he uploaded it yet or if there is any.

Thank you

lecy commented 4 years ago

Sorry, these instructions were a little confusing this week.

Lab 03 is about interpretation.

In Lab 04 you will select a new city and run the clustering code with data from that city.

So for now the goal is to link the topic of scale creation from Lab 01 with clustering in Lab 03.

If you were creating a scale to identify each cluster, which variables would be in that scale and how reliable do you think it would be?

For example, Group 4 would consist of non-Hispanic retired veterans:

image

malmufre commented 4 years ago

Hello Dr Lecy and Ghida,

Does that mean we have to choose certain variables from the above figure and explain how they may be reliable? For Instance: In the example of group 4 , we can choose the variables with the highest percentile only and then repeat the same process for other groups. Am I getting this correctly?

lecy commented 4 years ago

You don't have to calculate reliability for this step.

All you have to do is come up with a label for each group. The clustering algorithm returns 8 distinct groups.

I was just making a comment that clustering and the measurement exercise in Lab 01 are linked.

When creating a scale explicitly, you design survey items or select variables that you think will be related (or know to be related if you have a correlation matrix).

In clustering, the algorithm identifies variables that are related. Specifically, the algorithm is minimizing the within-group variance and maximizing the between-group variance. So it's trying to assign tracts to the most similar groups possible in order to create the best fit score (similar to how OLS regression finds the line that minimizes the sum of the squared residuals).

They are slightly different exercises because clustering algorithms are essentially trying to create multiple scales at once and it's maximizing group difference instead of the alpha.

But when you are creating labels you will see that each group is typically defined by a small set of variables that have high or low scores.

Another way to think about it is creating 8 distinct indices, then assigning each tract to a group if it scores "high" on one of the indices (high being top 20th percentile).

I hope this isn't confusing - it's not needed for the lab. Just trying to help you connect the dots.

JasonSills commented 4 years ago

Hi @lecy

My understanding is that the lab portion is simply a code through and we are running the chunks and turning that in. Then on Yellowdig we are submitting the labels we think fit the 8 clusters. Is my interpretation correct? It makes the lab very easy and not a lot of work, so I guess the mind recoils and I'm just making sure I'm right.

MeghanPaquette commented 4 years ago

Just an FYI if anyone gets this error code when creating the maps from the code-through, you have to update your packages in R. I updated my packages and it helped.

"Error in CPL_transform(x, crs, aoi, pipeline, reverse) : OGRCreateCoordinateTransformation() returned NULL: PROJ available?"

lecy commented 3 years ago

@JasonSills sorry - your question got lost in this thread.

Yes, L3 was just introducing clustering and the emphasis was on interpretation. You don't need to run the code, you can use the existing charts.

L4 has you replicate L3 with a new city. You will need to submit your code and new interpretations of clusters that emerge in your selected city.

Looking back over instructions they were not that clear. Fixing that now!

lecy commented 3 years ago

@MeghanPaquette Just to make sure I understand -

You received this error before or after updating packages?

Error in CPL_transform(x, crs, aoi, pipeline, reverse) : OGRCreateCoordinateTransformation() returned NULL: PROJ available?

What code produced that error? It looks like a map projection step.

MeghanPaquette commented 3 years ago

@lecy - The errors were occurring before updating. Once it was updated it all seemed to work. I don't remember the specific line and the person I replied to removed their comment. I wanted to say it was around line 97, but I could be wrong.

lecy commented 3 years ago

@MeghanPaquette Got it, thanks