bootstrapworld / curriculum

6 stars 6 forks source link

Dataset Ideas to Expand Ability to meet Equity Goals #702

Closed flannery-denny closed 2 years ago

flannery-denny commented 2 years ago

Screen Shot 2021-12-10 at 12 24 48 PM

Current DataSets that could maybe be used in a more targeted lesson:

We are about to have a group of undergrads support us in expanding our datasets. They will start with the same prompt, but if they don't come up with an idea of their own, we will need to assign some.

Brainstorming here about relevant issues that there may be good data on.

Perhaps no statistic better illustrates the enduring legacy of our country’s shameful history of treating black people as sub-citizens, sub-Americans, and sub-humans than the wealth gap.

flannery-denny commented 2 years ago

Three categories of data dominated the responses. Topics related to food and health came first at 20.8%. Two were about New York City restaurants, but all the rest had a health dimension: nutrition in fast foods, sodas, (breakfast) cereals, snacking, and cancer rates. Equally popular was a cluster on crime and policing, mostly about stop-and-frisk and marijuana arrest rates (comparing Black versus white, rural versus urban, etc.): in short, politically important and sensitive topics (some of which may have been especially salient in light of 2020’s Black Lives Matter protests). Close behind was the category of sports and entertainment (16.4%), but spread between baseball, basketball, track-and-field, and video game reviews. A dataset of the “Top 100 Movies” attracted 7% of students, followed by a long tail: data about specific states, the environment, animals, college majors, and charter schools each had more than one user. In short, we see a broad spread of topics that could reasonably be described as personally meaningful.

retabak commented 2 years ago

Maybe something on college admissions pre / post required standardized testing? Here's an article and a study and also a pretty interesting This American Life episode

Diversity in Hollywood

Bechdel Test data

I'll likely think of more!

flannery-denny commented 2 years ago

Rec from Shriram: https://www.datacommons.org/

flannery-denny commented 2 years ago

@schanzer @retabak I have 13 undergrads signed up to work on datasets over winter break! and 12 of them want to be assigned a focus. All input appreciated! Trying to get back to them tomorrow.

Particularly useful would be to suggest columns that you would like to see included. And any sources for data on your radar.

Data Set Ideas

flannery-denny commented 2 years ago

@schanzer Several of the datasets we'll be building will use zipcodes as rows.

There are 41,692 ZIP Codes in the united states. Ideas on how to select a manageable number of them to include. Should this be random? If so, is there a simple way to randomly select rows from a csv file that you know of?

schanzer commented 2 years ago

@flannery-denny to randomize rows...

  1. Select all rows
  2. Right-click and select "randomize range"

Then select the top n rows, which represent n random rows pulled from the population

flannery-denny commented 2 years ago

awesome. thanks!

On Mon, Dec 20, 2021 at 11:16 AM Emmanuel Schanzer @.***> wrote:

@flannery-denny https://github.com/flannery-denny to randomize rows...

  1. Select all rows
  2. Right-click and select "randomize range"

Then select the top n rows, which represent n random rows pulled from the population

— Reply to this email directly, view it on GitHub https://github.com/bootstrapworld/curriculum/issues/702#issuecomment-998120840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4Z7TG5AHO3IG3PXR72EPDUR5QFXANCNFSM5J7FJXOA . You are receiving this because you were mentioned.Message ID: @.***>

flannery-denny commented 2 years ago

@schanzer I want to make an organized list of the ideas that were generated for future datasets and close this issue. Would you prefer the list in a git issue or as a google doc in the datasets folder?

schanzer commented 2 years ago

@flannery-denny Let's open a new issue, and label it "not urgent".