edquant / edh7916

Course materials and website for EDH7916: Contemporary Research in Higher Education
https://edquant.github.io/edh7916/
3 stars 1 forks source link

Initial Analysis: variables not so great #34

Closed nszekeres closed 4 years ago

nszekeres commented 4 years ago

Hi, Dr. Skinner - what happens if it turns out your independent variable was not so great? I'm working with a few different variables, but basically, the relationship is a little irrelevant. Should that deter my project? Or, is the focus moreso on the process than the meaningful study/result? Thanks, Naomi

btskinner commented 4 years ago

@nszekeres, sometimes that happens. There are a few approaches one can take:

  1. explore another relationship that's related, but maybe more interesting
  2. change the project altogether
  3. stay the course, but discuss what's happening if the lack of finding is itself of interest

(3) works best (IMO) when you have good data and analysis as well as a strong theory, but the relationship just isn't there. That's interesting in and of itself (I expected a relationship based on theory/domain knowledge, so why isn't the relationship there?).

I'll leave it up to you how best to proceed based on those general options. Yes the process is important, but I do want the final report to have some relevance to the topic at hand (not just a report of an analysis that didn't work [unless that's the whole point!]).

nszekeres commented 4 years ago

@btskinner - Thanks for the response and the different options... makes sense, and I may contact you offline to further discuss my circumstances. Ty!

nszekeres commented 4 years ago

@btskinner, I want to ask a follow-up question on this... In my draft submission, you commented that I should rescale the data to eliminate outliers and spread out that cluster of dots near zero. When I did that, it did not do better... there was a spread of big spenders, and the majority were 0 or close to 0, so there was not appropriate x axis to depict. I explain in the writeup that I originally thought this would be a very important relationship, but the data was not there or it was not there. I include a Pearson's cooefficient to demonstrate the lack of relationship. Is that sufficient? Or, what do you need me to do here?? I did pick other variables and build a meaningful analysis off of them, but I think its more important to clarify that there was no relationship where I expected to find one, it's not that I just didn't think to look for one between tech spending and student outcomes.

(see attached image)

TY, Naomi

image

btskinner commented 4 years ago

@nszekeres, there are a couple of related, but different points here.

First, for rescaling. Sometimes you can rescale your data to change its shape on the figure. For example, if the distribution of your data is skewed, converting to the log scale might help. This changes the interpretation of the figure, of course, but can be clearer in other ways. If you have extreme outliers, you might also drop or top code them (meaning that all values > X are set to == X). Again, this changes the analysis/figure in ways that you want to acknowledge, but sometimes the new limitation is worth the benefit. In your case, just changing the scale from $1 to $100k won't change the overall shape since it's just a linear transformation. The only difference will be the x-axis scale, which is what you see. You either need to perform a non-linear transform to the underlying distribution or drop outliers if you want the plot to look different.

The second point: it may be the case that the relationship is weak and that's the point you are making. That can be useful, too, as I said before. But the question then: does the figure help show this? The answer is not really because the figure is so hard to parse due the outliers. You may be able to convince me that the figure is useful for making your point, but right now the figure is low information, meaning that it takes up more space than it seems worth. Another version of this figure or a figure that better shows a relationship of interest might be better use of space.

nszekeres commented 4 years ago

Thanks for the feedback, @btskinner -

I tried the logscale - it didn't help. I experimented with other views - was worse.

Killing the outliers kills most of the dataset. I did condense all small spenders that wouldn't be expected to show a relationship, and there seems to be a relationship with the remaining, but then the question remains should you bother to depict only 5-20% of your dataset for the purpose of getting it to look pretty? I feel like I'm spending too much time on something that just isn't there in the IPEDS.

Below is where I am at... interested in your thoughts.

Thanks, Naomi

plan is to decision this tonight... not sure whether to keep or discard the images: image

and kill this: image

btskinner commented 4 years ago

@nszekeres, you've definitely been doing your due diligence on this. I'm going to leave it up to you. If you think the figures aren't that informative no matter what you do, then maybe you don't include them (and include some other relationship). If you think that it's useful for the reader to see that there's no relationship, then leave them in and make that point.

nszekeres commented 4 years ago

@btskinner - ok, thanks, and thanks for your $.02!