Closed carriewright11 closed 1 month ago
No spelling errors! :tada: Comment updated at 2024-07-08-15:33:33 with changes from b0ffec1cfce20faebc6dc6ffc01214db5b2502ef
No broken url errors! :tada: Comment updated at 2024-07-08-15:33:33 with changes from b0ffec1cfce20faebc6dc6ffc01214db5b2502ef
ok... want to check with @ehumph about the source for the data - can we add anything here?
Also I don't like how the gender column isn't more inclusive...typically I would point something out about this or try to avoid that data. There is only one question currently that uses it to get to one answer which isn't so bad... but want to think about this.
@avahoffman what are your thoughts? In all the examples in the lecture and all the other questions in the lab we ignore that column, which I think helps, but normally I would want to say that two genders is not inclusive. I know we have also worried about making people feel called out as well by pointing that out.
is there a better clean_names example than this?
CO2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") head(CO2, n = 2) clean_names(CO2)
Previously used opioid data that had upper and lower case variables
Re-rendered previews from the latest commit: See preview of website here
Updated at 2024-07-08 with changes from b0ffec1cfce20faebc6dc6ffc01214db5b2502ef
@carriewright11 @avahoffman Regarding the gender
variable: this is historical data, and they will likely work with historical data in their job. It seems like this could be a good opportunity to demonstrate how sometimes you get variables with non-inclusive categories, or just variables that aren't great. You could discuss how to approach data analysis with such variables, as well as point out how to identify historical variables that are problematic. My guess is a decent number of them won't even notice that the gender
variable is non-inclusive until it is pointed out to them, because it's so commonly used with these categories.
is there a better clean_names example than this?
CO2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") head(CO2, n = 2) clean_names(CO2)
Previously used opioid data that had upper and lower case variables
All of the datasets had really clean variable names from the start
As for data provenance: it is heat-related ER visits between 2011 and 2022, as reported by the state of Colorado, specifically made available by the Colorado Environmental Public Health Tracking program website. Full dataset available at https://coepht.colorado.gov/heat-related-illness
I agree, I think a disclaimer is is appropriate for gender. There are pretty progressive folks who use a disclaimer, when working with (as Elizabeth mentioned). Could point to/use the conclusion of this paper's abstract : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526522/
Regarding clean names - maybe just pull a random csv from the data folder? If it's brief, no need to use one of the key datasets..
I agree, I think a disclaimer is is appropriate for gender. There are pretty progressive folks who use a disclaimer, when working with (as Elizabeth mentioned). Could point to/use the conclusion of this paper's abstract : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526522/
Regarding clean names - maybe just pull a random csv from the data folder? If it's brief, no need to use one of the key datasets..
ok - the one I used makes all the year names into x1891 kind of thing - do we like that?
ok - the one I used makes all the year names into x1891 kind of thing - do we like that?
Yes!!! love this example. We might need to use it in future labs :)
Gonna merge this!
updating the data to be more environmental