Open Jesse-Klein opened 5 years ago
Hey @jrkleinfsulib - sorry this fell through the cracks? I take it this is language you want to suggest... do you feel comfortable opening a PR with these changes?
also in the metadata section of the second episode, I suggest the following changes to the paragraph preceding the one above
However, except for column headers, metadata should not be contained in the data file itself. Unlike a table in a paper or a supplemental file, when metadata is included in data files in the form of legends, it can disrupt how computer programs interpret your data file.
Instead, record and store metadata within a separate file in the same directory as your data file. The file should be in plain text format when possible, with a name that clearly associates it with your data file. Because metadata files are free text format, they also allow you to encode comments, units, information about how null values are encoded, etc. that are important to document but can disrupt the formatting of your data file.
Suggestions for edits/additional content Data Organization in Spreadsheets for Social Scientists Formatting data tables in Spreadsheets Metadata
Some of this information may be familiar to learners who collect or analyze survey data or data sets accompanied with additional data documentation, such as codebooks. Codebooks will often describe the original survey or interview questions associated with particular variables, the way variables have been constructed, response categories and their associated values, and the notations for missing values throughout the data. For example, the General Social Survey maintains their entire codebook online. Looking at an entry for a particular variable, such as the variable SEX, provides valuable information about the original question wording, scales or response categories, the years covered for that variable, the sample or sub-samples surveyed, and the meaning of particular values. Descriptions of missing values are important in cleaning survey data because they describe the various reasons why respondents did not answer a question (i.e., not applicable, didn't know, refused to answer, etc.), which leaves blank cells in the data. For example, in the General Social Survey missing values are numbered as 8, 9, 0 and sometimes other numbers that might be interpreted later on as integers that could interfere with accurate queries and analyses.