Subsetting - Githubissues

fhdsl / DaSEH

🌼 Collection of materials to accompany the R25 funded "Data Science for Environmental Health" short course

https://daseh.org

MIT License

0 stars 1 forks source link

Subsetting #69

Closed carriewright11 closed 1 month ago

carriewright11 commented 1 month ago

updating the data to be more environmental

github-actions[bot] commented 1 month ago

No spelling errors! :tada: Comment updated at 2024-07-08-15:33:33 with changes from b0ffec1cfce20faebc6dc6ffc01214db5b2502ef

github-actions[bot] commented 1 month ago

No broken url errors! :tada: Comment updated at 2024-07-08-15:33:33 with changes from b0ffec1cfce20faebc6dc6ffc01214db5b2502ef

carriewright11 commented 1 month ago

ok... want to check with @ehumph about the source for the data - can we add anything here?

Also I don't like how the gender column isn't more inclusive...typically I would point something out about this or try to avoid that data. There is only one question currently that uses it to get to one answer which isn't so bad... but want to think about this.

@avahoffman what are your thoughts? In all the examples in the lecture and all the other questions in the lab we ignore that column, which I think helps, but normally I would want to say that two genders is not inclusive. I know we have also worried about making people feel called out as well by pointing that out.

carriewright11 commented 1 month ago

is there a better clean_names example than this?

CO2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") head(CO2, n = 2) clean_names(CO2)

Previously used opioid data that had upper and lower case variables

github-actions[bot] commented 1 month ago

Re-rendered previews from the latest commit: See preview of website here

Updated at 2024-07-08 with changes from b0ffec1cfce20faebc6dc6ffc01214db5b2502ef

ehumph commented 1 month ago

@carriewright11 @avahoffman Regarding the gender variable: this is historical data, and they will likely work with historical data in their job. It seems like this could be a good opportunity to demonstrate how sometimes you get variables with non-inclusive categories, or just variables that aren't great. You could discuss how to approach data analysis with such variables, as well as point out how to identify historical variables that are problematic. My guess is a decent number of them won't even notice that the gender variable is non-inclusive until it is pointed out to them, because it's so commonly used with these categories.

ehumph commented 1 month ago

is there a better clean_names example than this?

CO2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") head(CO2, n = 2) clean_names(CO2)

Previously used opioid data that had upper and lower case variables

All of the datasets had really clean variable names from the start

ehumph commented 1 month ago

As for data provenance: it is heat-related ER visits between 2011 and 2022, as reported by the state of Colorado, specifically made available by the Colorado Environmental Public Health Tracking program website. Full dataset available at https://coepht.colorado.gov/heat-related-illness

avahoffman commented 1 month ago

I agree, I think a disclaimer is is appropriate for gender. There are pretty progressive folks who use a disclaimer, when working with (as Elizabeth mentioned). Could point to/use the conclusion of this paper's abstract : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526522/

Regarding clean names - maybe just pull a random csv from the data folder? If it's brief, no need to use one of the key datasets..

carriewright11 commented 1 month ago

I agree, I think a disclaimer is is appropriate for gender. There are pretty progressive folks who use a disclaimer, when working with (as Elizabeth mentioned). Could point to/use the conclusion of this paper's abstract : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526522/

Regarding clean names - maybe just pull a random csv from the data folder? If it's brief, no need to use one of the key datasets..

ok - the one I used makes all the year names into x1891 kind of thing - do we like that?

avahoffman commented 1 month ago

ok - the one I used makes all the year names into x1891 kind of thing - do we like that?

Yes!!! love this example. We might need to use it in future labs :)

carriewright11 commented 1 month ago

Gonna merge this!