Nonprofit-Open-Data-Collective / r-crash-course

A collection of resources and notes that serve as an introduction to the R data programming language.
https://nonprofit-open-data-collective.github.io/r-crash-course/
2 stars 1 forks source link

R for coding - qualitative analysis #1

Open drdoan opened 3 years ago

drdoan commented 3 years ago

Hello!
There are many resources to learn R and I'm wondering if people could point me in the direction of packages and trainings that support coding qualitative data using R. The courses I've taken so far all focused on quantitative analysis in R but I understand R has qualitative analysis capacities as well. Would love to know more about that!
Sincerely, Dana

lecy commented 3 years ago

Hi Dana - it depends on what type of qualitative analysis you want, but you can see a lot of examples that use text as data under the Natural Language Processing task view:

https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

That is more to give a sense of options once you are comfortable with R. I would start with RQDA (R qualitative data analysis) or quanteda (text analysis).

lecy commented 3 years ago

https://martinctc.github.io/blog/a-short-r-package-review-rqda/

https://quanteda.io/articles/quickstart.html

drdoan commented 3 years ago

Wow! Thank you for your speedy reply and links to suggested resources. I will check them out.

(I would like to learn how to use R, as opposed to Atlas.ti or NVivo, to undertake an analysis of open-ended responses to reflection questions. I am wondering about the pros/cons since none of my qualitative analysis friends and mentors are using R for this purpose.)

lecy commented 3 years ago

It's a good question, there are a few things to consider.

Fundamentally you need to consider your goals in learning a tool. Atlas.ti or NVivo are like a single-purpose kitchen gadget like a rice cooker. R is like a chef's kitchen (collection of knives, pots, pans, stove). A rice cooker will make a perfect pot of rice every time and is easy to learn how to use. A chef's kitchen, however, takes some practice. Making rice is harder in regular pots and pans, and it might not come out quite as nice as the rice cooker.

However, it's hard to make anything else in the rice cooker. The chef's kitchen is a blank slate - any meal you can imagine you have the tools to make. You have all of the tools - you just need to develop the skills to make each meal, which takes practice and a commitment to the joy of cooking.

So at the point you are deciding on software reflect on whether you can live with rice, or whether you will need methodological diversity. If you think that coding interviews will be your bread and butter for the next phase of your career and you will not need other tools then you might be better off with single-purpose software because it is optimized for the one thing it does, so it will do it really well. Specifically R and the other tools will have comparable functionality, but the graphical user interface on commercial software will be more elegant.

If you are a multi-method researcher, or if you know that your skills will need to grow and evolve as your career progresses, then a more flexible analytics environment is a good investment because once you understand the basics its easier to pivot. I personally grew to like R because I found myself using a chain of software products - Excel to clean data, a network analysis program for analysis, then exporting back to Excel and a different data visualization program to create tables and graphics, whereas you can do all of it in R.


Cost

Other considerations are cost. A single license for a scholar is feasible. If you have multiple licenses within an organization or you need a server license for a collaborative project then it can run in the tens of thousands. If you are teaching your university needs to buy licenses for labs or your students all need to buy temporary licenses. Those costs add up quickly, whereas R is free.

Reproducible Workflow

How important is it for your work to be extensible? Specifically, do you plan to mentor graduate students by working together on projects? Or will you be overseeing a team of analysts in an organizational context? Scripted languages have advantages because you create data recipes that can form the basis of libraries of tasks that others can learn from and extend. Similarly, you can audit the work of your students or employees, whereas work done in Excel or point-and-click software is very challenging to audit because you don't log workflow.

If you plan on teaching methods at some point then platforms like R have some advantages. The scripting of tasks, as mentioned above. Also data-driven documents that make it easy to create course lecture notes that contain reproducible steps.

Networks

Your networks matter when you need help. Most people would default to whatever smart people around them are doing. But be aware that your qualitative friends may be using Atlas.ti and NVivo because of path dependency, not because of functionality. They are also great rice cookers! Easy to learn, have some built-in quality control. R will have a steeper learning curve but a lot more functionality.

You might reflect on how you define your research network. I started learning R after seeing that people like Gary King and Carter Butts loved it since they are a lot smarter and well-informed. There were no courses when I started learning it, so I relied heavily on virtual user communities. I found that they are extremely helpful when you don't have colleagues that can answer your questions. Additionally, these communities tend to be more innovative and forward-thinking on the methods front so you learn about new trends ahead of time and are constantly exposed to new tools. For some that is a pro. For some it feels overwhelming. I personally enjoy the creativity a platform like R affords (I get bored with rice quickly and enjoy new recipes, even if they don't always turn out).

Scale

R scales well. If you need to expand your methods toolkit for a project there is definitely a package for whatever you need (there are over 20,000 packages currently). If you need to quickly train a team of analysts on a method you can use scripts to guide them. If you want to move from coding transcribed interviews to archives of newspapers and need to automate coding R will have advantages. If you need to work with larger and larger datasets R has a lot of flexibility.

Maintenance Costs

Frequency matters - there is a maintenance cost. If you only do analysis a few times a year then it's much easier to use a simple tool with a user interface because you don't have to remember user interfaces - they are designed to guide you. With R it will take a minute to get back into the analysis if you haven't had to work with data for a couple of months, kind of like baking something you have made a hundred times before but you don't have the recipe memorized - you have muscle memory but need to refresh your memory by glancing at ingredients and oven settings.

If you do analysis frequently for your job or you teach methods then R is great - it's like a maintenance plan for the high level of fitness you achieved while training for a race. If you are an MPA that wants to work as a policy analyst, for example, perfect. If you want to run a nonprofit and you need to crunch some numbers on occasion for grant applications or reports then you are probably better off using SPSS because the user interface serves as your external memory so you don't have to remember all of the commands.

Commitment Mechanisms

Lastly I would say don't dabble. Learning R is like learning a foreign language. If you join a French club every year and attend for a couple of weeks before getting busy you will find that you need to relearn the same material over and over and your skills don't progress. It's not a good use of your time. You need to reach a basic level of competency to start to enjoy and appreciate it. For my students this usually takes 2 courses over 2 semesters. For PhD students I always recommend starting to use R for regression classes to get a feel for it (regression tools in R are pretty simple), then pick one big class project or research deliverable to do in R as a commitment mechanism. I used course projects and my dissertation as the excuse to learn R, then started teaching methods courses so I could continue to deepen my knowledge.

A seminar like this is a great way to explore a new topic. Stop in for an hour for a walk-through of a project and a behind-the-scenes discussion of the process (the final research product may be different from where the project started). Hopefully these seminars can help people determine whether it's a skill they want to develop.

drdoan commented 3 years ago

Dear Dr. Lecy,

Thank you for this helpful reply! I appreciate the rice cooker analogy.

I am going to try and attend some of your live trainings; however, I am currently in Vietnam and the timing of the crash course works out to be 1:00am for me. I did review the slides from Module 1 but realize attending the live sessions would be better for me, if I can stay up.

Thanks again!

lecy commented 3 years ago

We are posting the videos from previous sessions if that works better.