Closed ColeCampton closed 4 years ago
The command you've pasted there is not invoking the survey weights. Did you accidentally elide that from this issue, or is it possible that you're running the mean commands on naïve/unweighted data?
Ah okay, i did forget the weights. Let me reassess
`global DRC_src = "....\unesco_equity\data\d.PUF_DRC_Baseline_Endline Grade 2-4-6 French Sample A\PUF_3.DRC2010_2014-Baseline_Endline_grade2-4-6_EGRA-EGMA_French_SampleA.dta"
use "$DRC_src", clear
summarize orf if treat_phase ==6 & grade ==2
Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- orf | 0
summarize orf if treat_phase ==6 & grade ==4
Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- orf | 1,725 7.878551 13.91502 0 141
The output from from running my DRC preprocessing do file reports 32.465 for the first and empty for the second. Admittedly this is like an issue with how I am using 00_apply_analysis.do
What's the count of observations in each of those subpops? (Esp. the first one)
1795 and 1745 respectively
Which disagrees with the output of the summary
command above... That is seriously weird. Maybe we need a screenshare to investigate.
Yes, however
count if treat_phase==6 & grade==4 & !missing(orf)
returns 1725 which does agree. It seems there are many missing orf values. The confusion part was that the missing values were not where I expected them in the excel sheet.
I would be happy to screenshare and see what we can find tomorrow
Error ended up being numerical indexing into subpopulation summary statistic matrix producing unexpected results when there are subpopulations with missing orf data. The fix was feature name indexing along with populating variables as empty when the orf data is missing.
I noticed that some outputs with new data sets had missing mean ORF values for certain subpopulations. When I investigated it seems that the output ORF means don't agree with those in the spreadsheet. For example
use "$primr_src", clear summarize eq_orf if treat_phase==1 & grade==1 & cohort ==1
Reports a mean of 5.35. However the value for the mean listed in the output spreadsheet is 6.778 and is presented as 6.8 in table 2 of CS.I have yet to identify the cause of this problem. @TSSlade