Final Paper Step 2 Part 3: Does Cronbach's alpha describe measurement error in a variable?

droach7 commented 3 years ago

Hi everyone,

I was attempting to answer this question today but I am not really sure if I am understanding the Cronbach's alpha correctly.

I learned a bit more about Cronbach's alpha from this website and concluded that:

Cronbach’s alpha measures internal consistency AKA how well the scale or test items measure a specified concept.
It is a function of the number of test items in an instrument, the average covariance between pairs of items, and total score variance.
It does not guarantee the validity of the instrument since it does not measure systemic error.

So, does this mean that since it is a function of covariance and total score variance it would describe measurement error in the variable the instrument is attempting to measure?

Clarification/discussion on this would be greatly appreciated!

danafuller commented 3 years ago

For our purposes it sounds more simple than that. It doesn't sound like it indicates measurement error. In our class notes it explains, "In this context alpha refers to a reliability measure for the instrument on a scale of zero to one, NOT the confidence level in a hypothesis test (also called alpha)... For now note it is a measure of reliability (scale of 0 to 1) and most importantly a signal that the instrument was developed by trained psychometricians using appropriate scientific methods. There are many instruments that have not been properly tested and subjected to peer-review, and thus might not actually measure what it claims to measure.

danafuller commented 3 years ago

@lecy I am wondering if there is a list of usual "approved" instruments that I could reference? I feel like I have jumped down a research rabbit hole and need to dig myself out. I think I may need to work backwards. If I could compare how one of the studies I am looking at did their research to the list of instruments, then I could identify it and search for a reliability score. I am not finding anything that explicitly states a specific instrument. Also, are Cronbach's alpha scores specific to each instrument that can be applied generally, or is it research design specific?

dholford commented 3 years ago

My sense is it's separate. If I'm reasoning through this correctly, variance and measurement error are separate. Variance is the known, though we may not know why the variance is there we do know it's there. Measurement error is random error, which makes me think it can't be explained and thus isn't connected to variance. Alpha, and Cronbach's alpha specifically are related to variance but measurement error and variance are not interchangeable... I think.

lecy commented 3 years ago

@droach7 I would read the section on validity and reliability here:

https://github.com/DS4PS/cpp-529-master/raw/master/articles/measurement/measurement-theory-and-practice.pdf

Validity is a little complicated because you need theory or expertise to determine if the underlying construct makes any sense. In empirical studies we typically use predictive validity - does the measure predict something about the world? For example, if you know someone's IQ you can predict how they will perform on certain tasks.

Cronbach’s alpha measures internal consistency AKA how well the scale or test items measure a specified concept.

The alphas (like Cronbach's alpha) measure reliability. Reliability is a measure of the ratio of measurement error in an instrument, or alternatively that stability of the construct over time. For example, if will power can be measured precisely but varies from day to day, the over-time variance behaves a lot like measurement error. Test-retest reliability is another version of this (how consistently can people perform a specific task - higher variability in performance over trials leads to lower alpha in an instrument constructed from multiple trials).

For alpha consider the latent construct of athleticism that is a measure of performance on three physical tasks (e.g. running a mile, biking 10 miles, swimming 200 yards). 1,000 people complete the tasks and each X represents their relative performance (percentile score) on each task.

We can decompose each score into a component of performance explained by overall athleticism or fitness (a) and an idiosyncratic or task-specific skill component (e for error or residual).

X1 = a1 + e1
X2 = a2 + e2
X3 = a3 + e3

When you combine the three variables X1 to X3 into a common scale you will have a component that represents a stable measure of the construct:

A = ( a1 + a2 + a3 ) / 3

And you will have a component of the three variables that represents random measurement error:

B = ( e1 + e2 + e3 ) / 3

The ratio of these components - the signal to noise ratio:

alpha = A / (A+B)

So, does this mean that since it is a function of covariance and total score variance it would describe measurement error in the variable the instrument is attempting to measure?

The covariance (correlation between X's) is used to decompose the variance of X into a common component (fitness) and an idiosyncratic component (skill at the specific task, quality of equipment, rest the previous night, breakfast, or purely stochastic/random elements).

It's not unlike the task of decomposing the variance of Y into an explained portion and a residual portion:

The R^2 measure (variance explained) has similar structure:

R-square = RSS / (RSS + ESS)
# RSS = regression sum of squares
# ESS = error sum of squares

This is a bit of an over-simplification but hopefully the intuition is there. Does that answer your question?

You get more practice with this in CPP 529 Community Analytics: https://ds4ps.org/cpp-528-spr-2020/labs/lab-02-tutorial.html

lecy commented 3 years ago

Variance is the known, though we may not know why the variance is there we do know it's there. Measurement error is random error, which makes me think it can't be explained and thus isn't connected to variance. Alpha, and Cronbach's alpha specifically are related to variance but measurement error and variance are not interchangeable... I think.

Total variance is decomposed into common variance and idiosyncratic variance.

The idiosyncratic component is the measurement error. Or the noise in the signal to noise ratio.

lecy commented 3 years ago

@danafuller there are thousands of instruments and they will be specific to different research domains. I am not aware of a database.

Psychometrics, for example, has developed a bunch of instruments to measure personality.

Public health has developed different measures of population health (physical mobility, mental health, toxic load, etc).

Education has lots of measures of academic performance and cognitive ability (they used something like 15 different measures of cognitive ability in the study from Lab 1).

Start with your outcome and search an academic database like Google Scholar for things like "measures of..." or "instrument for ...".

Each instrument will have its own alpha (reliability score). In theory the score should describe the expected performance of the instrument in any study. But there will be subpopulation variance (the score has higher reliability when used with some populations and lower with others).

The subpopulation variance is one of the reasons colleges are moving away from SAT / ACT scores for admissions. Those scores are good at predicting academic potential for homogenous populations coming out of good school districts. The reliability drops significantly when trying to measure academic potential of diverse populations or those coming out of low-quality school districts. As a result some colleges have concluded that they negatively disadvantage applicants that have the ability to succeed but score low on standardized tests, thus they can fail to measure merit consistently in a way that reinforces the status quo and is detrimental to social mobility.

It's a good example of the politics of measurement when used in high-stakes decision-making.

danafuller commented 3 years ago

I am specifically trying to find studies on any psychological effects of masking young children. I think because it is in its infancy it is really difficult to find what I'm looking for. I have identified a couple of studies in that Google scholar database using those search methods that you cited, but the instrument is not clear which is why I was trying to work backwards. If I can't find anything specific to what I'm looking for I think I will just pivot and just do the psychological effects of isolation instead even though it's a little bit off topic for what I am looking to do. I won't be able to dig back in for a couple of hours so hopefully something will pan out then.

lecy commented 3 years ago

You would not find anything on "psychological effects of masking".

But here is where you need to select a specific outcome for your study. Your hypothesis is generally that masks can negatively impact children. You then need to operationalize "negative impact".

For example, you could measure:

Physical health
Mental health
Strength of social ties (social capital)
Self-esteem / sense of agency and control in life
Academic performance (harder to understand people wearing masks leading to worse comprehension?)

You can find instruments for all of these things that you can use as outcomes in your study.

You will need to convince your audience that the measure you select is a good fit for your research question. Economists are famous for using poor proxy measures. For example, GDP as a measure of national utility, well-being or happiness. Number of patents as a proxy for innovation (maybe in science, but misses a lot of cultural innovation).

droach7 commented 3 years ago

The alphas (like Cronbach's alpha) measure reliability. Reliability is a measure of the ratio of measurement error in an instrument, or alternatively that stability of the construct over time.

The covariance (correlation between X's) is used to decompose the variance of X into a common component (fitness) and an idiosyncratic component (skill at the specific task, quality of equipment, rest the previous night, breakfast, or purely stochastic/random elements).

The idiosyncratic component is the measurement error. Or the noise in the signal to noise ratio.

@lecy So, to (hopefully correctly) sum up your explanation: because Cronbach's alpha measures relatability--which is a measure of the ratio of measurement error in an instrument-- it would describe measurement error in a variable that the instrument is designed to measure. For example, for my paper the latent construct I want to measure is student self-efficacy. I have picked a questionnaire with a reported Cronbach's alpha of 0.86. This questionnaire is the only way I am collecting data about student self-efficacy. We want an instrument with a Cronbach's alpha greater than or equal to 0.7 because that indicates a higher reliability score and thus a lower amount of measurement error of the variable (self-efficacy) that is measured by the instrument. The Cronbach's alpha does not describe the entirety of measurement error present in the study, such as for my other DV of interest student success, only the ratio of measurement error in the variable it is intended to measure (self-efficacy).

Am I understanding this correctly or missing a point in the bigger picture?

dholford commented 3 years ago

@droach7 sounds like we have a very similar evaluation. I'm wanting to look at "Do No Harm Grading" which popped up all over as a response to the quick shift to distance learning and COVID. I'm interested in student growth as the DV and also looking at student self-efficacy as the latent construct.

I've been looking at articles on Bandura's perceived self-efficacy scale and another article that seemed to adapt that same scale. There also seems to be quite a few mentions of the expectancy value theory and how that plays into academic achievement. Is that the measure you are looking at using as well?

droach7 commented 3 years ago

@dholford Oh, how cool! I definitely saw mentions of Bandura's perceived self-efficacy scale. The instrument I am planning to use is Zimmerman and Kulikowich's 2016 Online Learning Self-Efficacy Scale (OLSES), which has 3 sub scales: learning in the online environment, time management, and technology use. The original article was only published in physical form, so I am waiting for my ILLiad request to be processed so I can view it remotely. In the meantime I have used information from this article in which the authors adapted the OLSES to Turkish to be used by Turkish university students.

I picked this one since I am particularly looking at how lecture methodology in online classes impacts student performance, so I wanted an instrument specifically designed for online learning.

danafuller commented 3 years ago

Thank you @lecy. I see that I needed to be more specific in my search. That was helpful. Is a post Hoc Test Power of 0.88 considered a similar enough score compared to the 0.7 threshold for the Cronbach alpha?

Watts-College / cpp-524-fall-2021

Final Paper Step 2 Part 3: Does Cronbach's alpha describe measurement error in a variable? #14