The study example discussed in the mi_vignette pulls in various data sources, only one of which (school pairings with treatment assignments) is shipped with the package, via data/michigan_school_pairs.rda. Let's add some demographic elements to the table of data that we include so as to make it more useful for vignette and documentation examples.
We can't without special permission redistribute the MI student achievement data that we downloaded from CEPI (ie extdataURLs$MME as defined in developer_docs/mi_vignette/demo.Rmd), but there's no impediment to our redistributing Common Core of Data (CCD) data (extdataURLs$CCD as defined in the same file).
Let's merge into data/michigan_school_pairs.rda some relevant demographic info from the CCD, including measures of school size by grade, by demographic subgroups (including but not limited to race, sex and English Language Learner status) and perhaps by grade/demographic subgroup combination. If the CCD carries academic achievement information such as historical graduation rates, one or two variables of this type can also be included. Ideally the data frame will wind up with 5-15 columns in total. The precise variables should be selected to balance ease of recognition of the variables themselves with having recognizable variation in the variable, within and across school pairs. We should add new columns (variables) but not new rows (schools).
The get_and_clean_external_data.R and demo.Rmd scripts in developer_docs/mi_vignette/ will probably have to be adjusted so that when when they do merge the short michigan_school_pairs table against longer district or statewide tables from the CCD, we don't wind up with duplicate column names. (That merge should probably take from the longer CCD table whatever demographic and grade size variables may have previously been added from it to the shorter michigan_school_pairs table.)
I have it from a National Center for Education Statistics (NCES) official that it's OK to redistribute CCD data. But I recall from the same conversation that there's a blurb of some type that one is supposed to include with the redistribution, acknowledging the source. We need to track down the suggested text and make sure it gets added to the Details and/or References paragraphs of man/michigan_school_pairs.Rd.
I'm suggesting this for Cassandra as it aligns with her work on the mi_vignette (although the benefits will be mostly to other vignettes and documentation). @jwasserman2 is available to consult, as I am.
The study example discussed in the mi_vignette pulls in various data sources, only one of which (school pairings with treatment assignments) is shipped with the package, via
data/michigan_school_pairs.rda
. Let's add some demographic elements to the table of data that we include so as to make it more useful for vignette and documentation examples.We can't without special permission redistribute the MI student achievement data that we downloaded from CEPI (ie
extdataURLs$MME
as defined indeveloper_docs/mi_vignette/demo.Rmd
), but there's no impediment to our redistributing Common Core of Data (CCD) data (extdataURLs$CCD
as defined in the same file).data/michigan_school_pairs.rda
some relevant demographic info from the CCD, including measures of school size by grade, by demographic subgroups (including but not limited to race, sex and English Language Learner status) and perhaps by grade/demographic subgroup combination. If the CCD carries academic achievement information such as historical graduation rates, one or two variables of this type can also be included. Ideally the data frame will wind up with 5-15 columns in total. The precise variables should be selected to balance ease of recognition of the variables themselves with having recognizable variation in the variable, within and across school pairs. We should add new columns (variables) but not new rows (schools).get_and_clean_external_data.R
anddemo.Rmd
scripts indeveloper_docs/mi_vignette/
will probably have to be adjusted so that when when they do merge the shortmichigan_school_pairs
table against longer district or statewide tables from the CCD, we don't wind up with duplicate column names. (That merge should probably take from the longer CCD table whatever demographic and grade size variables may have previously been added from it to the shortermichigan_school_pairs
table.)man/michigan_school_pairs.Rd
.