Review & Corrections to Readme.md

jcwhitmer commented 1 year ago

[x] The summary item information table at the top of the Readme.md document contains doublescored, train/test entries, and year - I believe that we had consolidated this previously into a more simple table? @cb-air may have mentioned this in a prior issue, but I think that this table should be replaced.
[x] The "item information" and "data application" forms need to be uploaded and linked within the document. I'd suggest putting them in the /resources folder . @mathgie could you please address?
[x] the list of "variables common to all items" is outdated and doens't reflect what is in the dataset or the order. @cb-air
[x] correct the inter-rater reliability score types, which currently list component and atomic scores. I believe the counts are incorrect; @mathgie could you please review and confirm?

pdbailey0 commented 1 year ago

@jcwhitmer, I'm looking at the last pull and @cb-air seems to have entirely deleted the table from the readme.

[edited by PB, wrong link]

pdbailey0 commented 1 year ago

@jcwhitmer, I reviewed the files and the order and they agreed. Could your version be behind? or maybe the book isn't picking up the edits?

pdbailey0 commented 1 year ago

Looking at the first, it seems @cb-air's pull was closed by Charles. The text seems to say it was merged. The symbol indicates it was not merged. I have to admit, I don't have a lot of experience rejecting merges on GitHub, so I don't know what it looks like.

pdbailey0 commented 1 year ago

@mathgie it seems @cb-air's actually merged PR removed the table in book.md but, nevertheless, there it is, not fixed.

cb-air commented 1 year ago

I'm not sure what's going on, but maybe github is buggy or it has something to do with the locked repo?

In any case, here's the book.md in my only active branch (updated earlier today):

cb-air commented 1 year ago

To add info, in case it's helpful, when I did the most recent PR today (#9), I closed the PR from yesterday (8#) to prevent any confusion. But both of these should have had the variable order correct, and the "item information" (double scored) table removed.

It looks like the book.html currently in the main branch reflects the book.md from my most recent PR.

pdbailey0 commented 1 year ago

What's odd is a started to make a pull request from that branch Charles linked from, and it showed nothing to merge. So the file is different, but the same. GitHub is confusing me today.

jcwhitmer commented 1 year ago

Are you using "/cb-duplicate_PR_same_files" branch?

I assume that we're working from the main branch unless told otherwise, and that any substantive changes would be merged there upon completion.

pdbailey0 commented 1 year ago

The fact that @cb-air's PR shows no changes means they're synced.

GitHub doesn't work quite how I first imagined. It doesn't work by editing files; it works by implementing commits. Because of that, a merge doesn't always do what you expect. As I recall, it doesn't always do what is in the change log. I think I wrote to GH and they explained to me this was a feature somehow.

[edit by PB to make first two sentences into a thought.]

cb-air commented 1 year ago

I've copied from my PR, and pasted my section below. (Below, what you see is what you get except for the broken links for the word count plots and the line breaks aren't set right. Line lengths are set in the markdown header which I have not copied.)

Data File Information

Data for the competition has been aggregated into a single file from multiple test items. For this challenge you will be using items from the grade 4 and grade 8 NAEP Math Assessments that were administered in 2017 and 2019. Information about the aggregated file and how it was prepared, along with general instructions for the challenge and data handling rules are contained below. Questions about the challenge should be posted to the Github "issues" page for the challenge: https://github.com/naep-as-challenge

Variables Common to All Items

Some variables about the item, responses, and respondent were available for all items in the source data. Those variables are described in the table below.

Variable	Description	Type	Values (if constrained)
student_id	pseudonymous student ID -- not linkable across item-years	string	e.g. "xYzq4StVaC"
accession	Item number	string	e.g. "VH139087"
score_to_predict	Outcome to predict	integer	e.g. 1, 2, 3
predict_from	Text related to "score_to_predict"	string	"Because A>B"
year	Year assessment was administered	integer	2017, or 2019
srace10	Student's race reported by the school	string	(1='White, not Hispanic', 2='Afric Amer, not Hisp', 3='Hispanic of any race', 4='Asian, not Hispanic', 5='Amer Ind/Alaska Nat', 6='Native Ha/Pac Island', 7='>1 race, not Hispanic')
dsex	Student's sex	integer	1=male, 2=female
accom2	Student accommodations. Note: Item VH304954 did not have accom2 so for this item accom2 is entirely NA.	integer	1='Accommodated', 2='Not accommodated'
iep	IEP	integer	1=SD, 2=Not SD
lep	English learner status	integer	1=English Learner, 2=Not English Learner
rater_1	Score given by human rater (component-scored items only)	string	e.g. 1A, 2B, 3A …
pta_rtr1	Part A human rater score (composite items only)	string	e.g. 1, 2A, 2, 3A …
ptb_rtr1	Part B human rater score (composite items only)	string	e.g. 1, 2A, 2, 3A …
ptc_rtr1	Part C human rater score (composite items only)	string	e.g. 1, 2A, 2, 3A …
composite	Composite score (atomic-scored items only)	integer	e.g. 1, 2, 3
score	Score (containing partial credit codes)	string	e.g. 1A, 2B, 3A …
assigned_score	Simplified numeric score total for item (1, 2, 3...) from either "rater_1" or "composite"	integer	1, 2, 3 …
ee_use	Item used equation editor	integer	0=no EE use, 1=EE use

Data Processing Information

There are four "Type II" items which were composed of multiple sub-items or parts that each have their own set of scores and response fields. For the purpose of the challenge, participants are requested to score the combined overall score (score_to_predict), based on the constructed response component which we believe is the most salient (predict_from), using NLP. For the six other items, called "Type I" items here, there are multiple parts within an item; however, these parts are considered dependently linked portions of the item and, as such, were assigned a single score that encompasses the responses contained within both parts.

For the "Type II" items, the sub-item scores have been combined into a single "assigned_score" variable which is described in the common variables table above. The original part scores are also included and can be decoded using the item scoring guides provided in Item information.zip which will be provided to participants with the responses upon approval of the data application.

Note that this composite variable is not always the outcome which contestants should predict. To make it clear which outcome contestants should predict, we've created a variable "score_to_predict" which is the field which will be used as the outcome variable to create predicted scores for. We've also created a variable named "predict_from" to identify the text with the most relevant constructed response text to use when creating predicted scores.

The original item data contained extended constructed response and short constructed response (ECR and CR) text, item selections for multiple choice, and some process data (such as response "eliminations" for CR items) embedded within a json data structure, with MathML (XML) equation editor codes nested inside. The original test item data had different XML structures for each item, and within item there are differences in the XML coding between the year of administration. These differences may impact how predictive models will perform across years.

These data have been parsed to make them easier to process. The parsed XML data, in contrast to the common variables listed above, are different for each item. The item specific variables are described below the item name in the list that follows. Please note, the format of the data values for the process data (e.g. eliminations) may differ by year for the same item. For example, eliminations may be recorded as "(1, 2, 5)" in 2017 and "1, 2, 5" in 2019.

Also note, the CR text has been parsed but not completely cleaned. The data was analyzed for sensitive information (e.g. personally-identifiable information, profanity, toxic language) and some responses were removed as a result. However, spellcheck has not been applied to correct what may be obvious spelling errors.

Variables with different meanings for each item

Please consult the scoring guides included in Item information.zip to map the fields below to the question areas.

For item VH134067

parsed_xml_v1-- Text for ECR item response.

For item VH139380

parsed_xml_v1-- SCR text
\ parsed_xml_v2-- ECR text

For item VH266015

source1-- drag and drop tile "from"
\ source2-- drag and drop tile "from"
\ source3-- drag and drop tile "from"
\ source4-- drag and drop tile "from"
\ target1-- drag and drop tile "to"
\ target2-- drag and drop tile "to"
\ target3-- drag and drop tile "to"
\ target4-- drag and drop tile "to"
\ parsed_xml_v1-- CR text

For item VH266510

parsed_xml_v1-- ECR text
\ selected-- MC radio button choices as a logical vector (e.g. "FALSE FALSE TRUE FALSE") for 2019 only.
\ eliminations-- MC item eliminations as a variable length numeric vector (e.g., c(1,3,4)) for 2017 only.
\ eliminated-- MC item eliminations as a length 4 logical vector (e.g., TRUE FALSE FALSE TRUE) for 2019 only.

For item VH269384

selected1-- 1st MC item option radio button 1
\ selected2-- 1st MC item option radio button 2
\ selected3-- 1st MC item option radio button 3
\ selected4-- 1st MC item option radio button 4
\ selected1.1-- 2nd MC item option radio button 1
\ selected2.1-- 2nd MC item option radio button 2
\ eliminated1-- 1st MC item elimination option radio button 1
\ eliminated2-- 1st MC item elimination option radio button 2
\ eliminated3-- 1st MC item elimination option radio button 3
\ eliminated4-- 1st MC item elimination option radio button 4
\ eliminated1.1-- 2nd MC item elimination option radio button 1
\ eliminated2.1-- 2nd MC item elimination option radio button 2
\ parsed_xml_v1-- ECR text

For item VH271613

partA_response_val-- 1st MC item drop down menu selections as numeric vector (e.g. c("1","1")) in 2017, and a fixed length logical vector in 2019.
\ partB_response_val-- 2nd MC item radio button selections as vector (e.g. c("1","")) in 2017, and a fixed length logical vector in 2019.
\ partB_eliminations-- MC item eliminations for part B, format differs by year.
\ parsed_xml_v1-- ECR text
\ Note-- For both the response values and the eliminations, the format of the data changes between 2017 and 2019. In 2017, eliminations are stored as list of numbers, perhaps in chronological order (e.g.,"1", "2", but also "2--1" and "1--2"). In 2019 the responses and eliminations are stored as fixed length logical vectors (e.g., "TRUE TRUE").

For item VH302907

parsed_xml_v1-- ECR text
\ parsed_xml_v2-- CR text
\ parsed_xml_v3-- CR text

For item VH304954

parsed_xml_v1-- CR text
\ parsed_xml_v2-- CR text

For item VH507804

source1-- drag and drop tile "from"
\ source2-- drag and drop tile "from"
\ source3-- drag and drop tile "from"
\ target1-- drag and drop tile "to"
\ target2-- drag and drop tile "to"
\ target3-- drag and drop tile "to"
\ parsed_xml_v1-- CR text

For item VH525628

source1-- drag and drop tile "from"
\ source2-- drag and drop tile "from"
\ source3-- drag and drop tile "from"
\ source4-- drag and drop tile "from"
\ target1-- drag and drop tile "to"
\ target2-- drag and drop tile "to"
\ target3-- drag and drop tile "to"
\ target4-- drag and drop tile "to"
\ parsed_xml_v1-- CR text

Information about constructed response field

The following plots provide information about the distribution of word counts for the predict_from constructed reponse field.
\

Word count (excluding numbers and symbols)

\

Inter-rater Reliability

Approximately 5% of the NAEP item responses were double scored. Quadradic Weighted Kappa (QWK) was calculated to estimate the inter-rater reliability for the double-scored responses. The inter-rater reliability estimates for all items are presented below.

Table: N Counts for Test/Train Split

item	QWK	score type
VH134067	0.966	Type I
VH139380	0.981	Type I
VH266015	0.963	Type II
VH266510	0.933	Type I
VH269384	0.970	Type II
VH271613	0.975	Type II
VH302907	0.980	Type I
VH304954	0.984	Type I
VH507804	0.991	Type II
VH525628	0.956	Type I

Suppression

To minimize the risk of statistical disclosure, suppression was applied to demographic variables. To minimize the impact of suppression and algorithm was developed which prioritized which of the suppression variables were set to missing (NA). The suppression variables, listed in the order in which they were prioritized, were the following: "dsex", "iep", "accom2", "lep", and "srace10". The variable "year" was not included in the suppression.

Item Splits

The table that follows shows the N counts for the test and training data sets.

Table: N Counts for Test/Train Split

item	QWK	min	max	test	train	score type
VH134067	0.966	1	2	4,483	40,343	Type I
VH139380	0.981	1	3	2,018	18,157	Type I
VH266015	0.963	1	4	1,776	15,987	Type II
VH266510	0.933	1	3	4,296	38,667	Type I
VH269384	0.970	1	4	1,758	15,826	Type II
VH271613	0.975	1	4	3,096	27,858	Type II
VH302907	0.980	1	2	4,241	38,173	Type I
VH304954	0.984	1	3	2,743	24,686	Type I
VH507804	0.991	1	4	1,827	16,443	Type II
VH525628	0.956	1	3	1,808	16,275	Type I

pdbailey0 commented 1 year ago

@jcwhitmer I've been using GitHub for awhile now. I've never seen a repo in this state.

I wonder if @mathgie needs to rebuild the book.

jcwhitmer commented 1 year ago

Bizarre; @mathgie could you see if you can resolve this? I'd like to keep this repo, but we could always nuke it and start afresh if needed. Would be worth a post-mortem discussion once we are past the launch.

mathgie commented 1 year ago

So I currently made a separate branch and PR that includes my new resources folder and a fixed version of book.md. I need to go through my copy of book.md one more time once my branch is approved to make sure that any stray conflicting parts of the doc are cleaned up and any lingering formatting/links will work, but I resolved most of the conflicts.

pdbailey0 commented 1 year ago

thank you @mathgie! I think this type of situation is the genesis of git's name.

pdbailey0 commented 1 year ago

@mathgie, the table is still there. I'm a bit confused, BTW, did @jcwhitmer want the table to simply drop double-scored or to be removed entirely?

pdbailey0 commented 1 year ago

I see the table

I think we want that gone, because we later have this

jcwhitmer commented 1 year ago

@mathgie @pdbailey0 remove that table entirely; as you note, it's redundant.

jcwhitmer commented 1 year ago

These issues are completed and I've integrated into my review; we will use PR from now on for changes.

NAEP-AS-Challenge / math-prediction