replicate with DCPO data

fsolt commented 4 years ago

Ingredients

[x] mean public opinion data data/dcpo_theta.rda estimated using DCPO (see #2) on all available data (#6)
[x] control variables from replication dataset, updated to cover all country-years (#7)
[x] same model as replication materials
[x] purrr
[x] Rubin's rules

Results

[x] run model repeatedly and save results
[x] dotwhisker plot

sammo3182 commented 4 years ago

It's clearer to show the differences between Claassen and DCPO with the point estimates.

I compared the DCPO point estimates vis-a-vis uncertainty in another plot.

See the codes in e51a27a2186c0b51135ab0b90e45ce2e7875166d

fsolt commented 4 years ago

Oof. So, to summarize: Claassen m5, replication data, point estimate—positive significant; Claassen m5, replication data, w uncertainty—null result; DCPO, more data, point estimate—null result; DCPO, more data, w uncertainty—null result. Well, I think we know the lay of the land now. How best to explain it to others is still debatable, I guess.

fsolt commented 4 years ago

👀 @Tyhcass

Tyhcass commented 4 years ago

"DCPO, more data, point estimate—null result"... Wow.... Then, interesting.....

Tyhcass commented 4 years ago

" Even if the issue in question is less easily viewed as bounded, bounding is still a good idea because it reduces the uncertainty for the estimates for countries at the extremes, that is, those countries whose values should, in fact, be easier to estimate. As Linzer and Staton (2015, 229) note, “bounding the latent variable may do little harm to the scale and produce more sensible estimates of uncertainty.” DCPO, therefore, uses the logistic function to transform the unbounded estimates, ¯ θ0kt, to the unit interval" (Solt 2020, p8).
My immediate response is we still could be able to explain the null result of DCPO point estimates from the measurement uncertainty. DCPO reduces measurement uncertainty in the measurement model already. While without the boundings, Claassen underestimates the uncertainty and is more certain than he should be. Claassen's estimates are only rough approximations. So, again, that also explains why polarization is crucial ( responding to reviewers' questions in @fsolt PA paper)! Does it make sense? Do I understand the role of boundings in the correct way? @fsolt @sammo3182

Tyhcass commented 4 years ago

We do know DCPO is superior to Claassen's measurement, which has been demonstrated by the comparison table. The superiority may be from DCPO's capacity for dealing with ordinal variables or from its polarization or from better data year labeling. We don't certainly know which one is more crucial. But, at least, polarization jumps out. Make sense?

sammo3182 commented 4 years ago

Oof. So, to summarize: Claassen m5, replication data, point estimate—positive significant; Claassen m5, replication data, w uncertainty—null result; DCPO, more data, point estimate—null result; DCPO, more data, w uncertainty—null result. Well, I think we know the lay of the land now. How best to explain it to others is still debatable, I guess.

Agree! We already have multiple points to beat Claassen 2019 but how to make a fascinating K.O. is the primary question now. With the APSR piece later, we should be confident to say, "Claassen, we've done our best to rescue your arguments, but ah ya ya ya ya~" Yet, to get this piece to APSR, we need to go deeper than merely showing Claassen's wrong. We need to give some reasons at least. There are a couple of ways we can go:

Arguing Claassen using the inefficient method + insufficient data---DCPO should be one, ta-da~~
Arguing about the omitted variable bias---polarization, as Cassandra suggested.
Arguing the theory is wrong: only the elites matter.

All these three could be true at the same time, it would be difficult to elaborate them all together esp. in a short article. And Fred already demonstrated the first one methodologically in the PA piece and we do not have much evidence for the third one. So, the second way? A good side of it is that the path can both give some substantive contribution to the democratization theory. How do you think? Or other framing strategies? @fsolt @Tyhcass

fsolt commented 4 years ago

I think I have to take back the "we know the lay of the land now" part.

Thinking harder--and I really thought I posted this already but now I can't find it, so forgive any duplication--we have three manipulations, giving 8 conditions, and we've really only tested half of them so far. m5 vs DCPO, replication data vs more data, point estimates vs w uncertainty. We've done the most interesting ones, but there are still:

[ ] 1. m5, more data, point estimates
[ ] 2. m5, more data, w uncertainty
[ ] 3. DCPO, replication data, point estimates
[ ] 4. DCPO, replication data, w uncertainty

If all of these are null, i.e. "m5, replication data, point estimates" is the only one of the 8 that gives positive significant results, then I suppose we have a straightforward story to tell about how fragile the published results are. If any of the others have pos sig results (maybe the third one?), then the story is at least a bit more complicated. I guess we cross that bridge when we come to it.

Then the implication is that either (a) the theory is wrong, and democratic backsliding is all about breakdown of elite consensus, or (b) the theory may be right, but that this sort of survey question is poorly suited to measuring the actual amount of public support for democracy given the concerns raised in the screenshotted Tweet I posted over here, namely that while people may value democracy (and say so on Claassen-type questions), many of them may value their partisanship more (and reveal this in their reactions to their co-partisans' anti-democratic behavior).

fsolt commented 4 years ago

Sigh. This really would have been easier to write up if the main result had been replicable . . .

sammo3182 commented 4 years ago

To do these we need

[ ] Calculating DCPO theta with Claassen's data
[ ] Calculating Claassen's theta with our data

Do we have Claassen's raw data ready now, Cassandra @Tyhcass ? If so, once Fred @fsolt conducts the IRTs and I can replicate the models thoroughly.

In the meantime, I am thinking if just replicate m5 with more data would be enough. That will tell us what lead to the null results: the data or the method---I bet on the method, btw. Then we can frame the paper in such way:

Fig 1, m5 w. limited data + DCPO w. uncertainty: All NULL... whaaaaaaaaaaaat happened?
Fig 2, m5. w. limited data vs. m5 vs. full data vs. DCPO vs. full data: Ha! It's the method (and/or polarization).
Fig 3. DCPO w. full data vs. DCPO uncertainty; DCPOing APSR: Uncertainty also matters.

(Frankly, I don't see the value to conduct DCPO on the replication data, Fred. What are these for?)

Tyhcass commented 4 years ago

@sammo3182 Claassen's raw data? Do you mean raw survey data? We don't have and he didn't provide raw data either. Based on claassen_replication_raw, we used DCPO format it to claassen_replication_input. Claassen_replication_raw is our created raw data by following Claassen's way, covering 1988 to 2017, not more data. But, we do change the survey-year for it. Claassen_input_raw is our more data but doesn't include wvs7 data. So, make a little specification on your suggestion.

[ ] format_dcpo claassen_replication.csv, create dcpo_replication_input, then run dcpo_kfold and dcpo to get dcpo_out.
[ ] Use our more data, mood_dem which covering 1988 to 2020, to create claassen_moredata, then format it to claassen_moredata_input, next run kfold and get claassen_out.

After these two, we could compare Claassen with limited data(done), with more data, DCPO with limited data, with more data(done). @fsolt Any ideas?

sammo3182 commented 4 years ago

What I meant was anything we can use to run DCPO on similar data of Claassen. Probably data from the same year would be enough. But again, I am still not quite sure why we need DCP with limited data.

sammo3182 commented 4 years ago

Fred @fsolt , I don't think we have Claassen's theta for the full data. At least the one in the repo is only for the limited data. Do you already have Claassen's version for the full data? We need that to complete the replication~

fsolt commented 4 years ago

Oh, I thought the plan out of the meeting yesterday was to skip the Claassen-full combination. (I agree, we don't have that combination estimated yet.) Yesterday, in the interest of getting this done, I proposed we:

present the straight replication: Claassen-limited_data-point_estimates (see, we can do that)
introduce uncertainty: Claassen-limited_data-uncertainty (uh oh, results go away)
bring more data and better estimates: DCPO-full_data-uncertainty (still null—don't ignore uncertainty, kids)
[substantive conclusion: either (a) backsliding is an elite phenomenon or, (b) if public support is important, this kind of survey item doesn't capture it, or maybe even (c) this just isn't enough data. Also need to talk about the democracy -> dem_mood APSR result, of course.]

I think we have all the results we need for this plan, and other than figuring out how to represent the AJPS and APSR together (parallel or serial?), I think it's now just a matter of writing it up. Unless you aren't convinced of the plan? I kind of thought I was just spelling out your thoughts, but I'm open to revisiting it . . .

Tyhcass commented 4 years ago

En, when I double checked the models, there is a small update, which may provide a new potential explanation. @fsolt @sammo3182 Simply put, if we fully replicate Claassen's model, we need to use the thetas which are estimated after the first year a country was surveyed. Therefore, I rerun the models with DCPO point estimates, with and without the thetas before countries' first survey year. Then, the main result is theta is statistically significant although the effect is pretty close to zero when we fully replicate his approach.
How could we explain this? Excluding the data before the first year means more certainty? But meanwhile, it means few data right? BUT, since my laptop problem, when I merged theta and control_variable, the theta or control variables for Côte d'Ivoire São Tomé & Príncipe are missing. I am wondering, @sammo3182 could you please rerun the models again without data before countries' first years for Claassen and DCPO? Claassen excluded before-first-year data for both his paper. I update new control variable files with the first-year identifier. I didn't change our paper code, since my codes are so ugly and I don't want to ruin our beautiful codes ... What do you think? Rerun again or any other ideas? @fsolt @sammo3182 dcpo_firstyear

Tyhcass commented 4 years ago

Then, the results for the model2 in claassen's ajps piece are weird. Claassen's result was democratic support has positive effects on the level of democracy only in democratic countries,, but our results are opposite. Support significantly matters only in authoritarian countries in both dataset w/without years before the first-year survey. @fsolt @sammo3182
dcpo_m2

fsolt commented 4 years ago

This is an important catch, Cassandra. I agree with Claassen that we shouldn't extrapolate beyond the observed years—I don't do that in the SWIID either—but I admit I hadn't thought of the issue for a while. (At one point I had DCPO only estimating the years between the observed years for each country, but it since Stan didn't (and maybe still doesn't?) have a way to a really good way to declare ragged arrays (different numbers of years for each country) I eventually gave up on that. What a time-sucking rabbit hole that was, though.) So—right, we shouldn't include extrapolated years in our analyses.

Sigh. That does mean we need to revisit everything.

sammo3182 commented 4 years ago

This is an important catch, Cassandra. I agree with Claassen that we shouldn't extrapolate beyond the observed years—I don't do that in the SWIID either—but I admit I hadn't thought of the issue for a while. (At one point I had DCPO only estimating the years between the observed years for each country, but it since Stan didn't (and maybe still doesn't?) have a way to a really good way to declare ragged arrays (different numbers of years for each country) I eventually gave up on that. What a time-sucking rabbit hole that was, though.) So—right, we shouldn't include extrapolated years in our analyses.

Sigh. That does mean we need to revisit everything.

This data issue seems not to make much substantial change from our findings, though. The significance of the point estimations demonstrates nothing but the uncertainty matters a lot---that's our argument, right? Here's the result of AJPS with uncertainty:

So does in the APSR piece.

Given these results, our conclusions still hold, aren't they?

fsolt commented 4 years ago

Yep, you're right, Hu—those are the analyses I meant by "everything." With those yielding the same results, we can stick to the plan. DCPO point estimates aren't part of that story, and they don't undercut it.

fsolt / dcpo_dem_mood

replicate with DCPO data #4