dgmiller / RPworkshop

0 stars 1 forks source link

Determine workshop examples #1

Open dgmiller opened 5 years ago

dgmiller commented 5 years ago

Which real-world data projects can we use to demonstrate the skills in the workshop?

dgmiller commented 5 years ago

I made a questionnaire and handed it out to the RPs to gauge what kind of examples would be directly useful to their work. Here are the questions and responses:

As an RP, I use the following languages/coding environments (please select all that apply):

Q1

What is your primary language/coding environment?

Q2

As an RP, my work includes (please select all that apply):

Q3

The tasks that are the most time consuming are (please select all that apply):

Q4

The tasks that are the most complicated or prone to error are (please select all that apply):

Q5

dgmiller commented 5 years ago

The responses above seem to suggest that we will get the most traction out of examples that deal with data cleaning and basic modeling. I'll come up with something less modeling related and more focused on proper data management, things like writing a script that processes raw data rather than saving several versions, etc.

dgmiller commented 5 years ago

A good example project might be to run our own simplified conjoint study about something that RPs care about (for example, potential phd/grad school options). I like this because it involves data collection and cleaning and the data is personally relevant to the RPs. This means they are likely to be interested and invested in the outcome. Attributes would include school rank, avg time to graduate, stipend offer, avg placement rank, etc.

This repo has code that makes a conjoint survey within the R ecosystem and would be useful in this scenario.

dgmiller commented 5 years ago

After some thought, I think it's best to have two example problems that we will work through. The first is a comparison of simple linear regressions run on simulated data. This is conceptually easy and will allow us to introduce software engineering tools and concepts without getting fussy about methodology. The second example is a fully mature discrete choice modeling exercise that will lend a lot of opportunity for practice coding, communicating, documenting, and collaborating. It will also include an actual survey that should address issues about data collection and formatting as well. Hopefully there will be maybe two main groups, each trying to develop a fully functional repository over two days. This may involve a brief lecture by me about discrete choice experiments in the context of conjoint analysis.