Objectives for the R lesson

jt14den commented 5 years ago

An opening brainstorming issue on what this lesson might contain. Please pitch in!

Here are some things I've thought about it.

Cover basic data & statistical literacy - look at chapter 1 of https://leanpub.com/openintro-statistics, but covers concepts of variables & types, cases, data matrix, sampling, etc. This chapter 1 also forms a datacamp R lesson Intro to data https://campus.datacamp.com/courses/introduction-to-data/
Use data with numeric variables so we can do some stats - library oriented data with numerics (lib circ data?)
Data cleaning using the tidyverse - dplyr & tidyr
The basics of dataviz with ggplot2
Dynamic documents using rmarkdown

stragu commented 5 years ago

Creating a webapp with Shiny to publish an interactive (e.g. plotly) visualisation of stats (too much cognitive load maybe?)
Deal with text: use regular expressions (with stringr for example).
Play with APIs, which is a common use-case of R for accessing reference databases. For example with Web of Science: https://github.com/juba/rwos
Alternatively, use R to scrape data from the web (with rvest, which is convenient if we want to stick to tidyverse for simplicity)

I think we can thread a good story through all those concepts: acquiring data (import file and/or use an API) -> making it tidy -> do some stats -> visualise it -> create an rmarkdown report -> publish with shiny.

libjohn commented 5 years ago

This makes very good sense to me. I think your summary flowchart is an excellent good model.

i.e. acquiring data (import file and/or use an API) -> making it tidy -> do some stats -> visualise it -> create an rmarkdown report -> publish with shiny.

Some quick thoughts....

acquire a dataset: Chris mentioned working with Cross Ref data. I did a very rough initial EDA on that dataset. Potentially this is a very relevant dataset to the audience.
API - A colleague of mine mentioned using the Scopus API ( RVEST to orchestrate the API).
Text - Another idea includes gathering some LC class report data from any ILS and beginning to visualize the collection at a high level: i.e. simple visualizations by two-letter LC Class stems. This brings in the opportunity to introduce string manipulation ( stringr ).
Potential to do some text mining examples as well. Silge's / Robinson's excellent Tidy Text Mining https://www.tidytextmining.com/ is a good model f
Dynamic Report Generation and Literate Coding via RMarkdown: html_documents ; html_notebooks ; slides, dashboards, MSWord files, bookdown, blogdown, etc.)
Publish: This topic flows nicely from RMrkdown and can take many outcomes. Publish platforms can be tricky but there's always github.io and the increadibly convenient netlify.com. I wouldn't want to necessarily breakdown how to use Netlify but it can be discussed /demonstratedsimply with about 2-5 minute space of time. Shiny is definitely one possible publish outcome here.

One comment on Shiny. While I am a big fan, I also think your concern about complexity (as you call it, cognitive load) is worth thinking/talking through. One potential alternative (just thinking out loud) is FlexDashboards/HTML Widgets. The primary reason is there's less scaffolding (thank Shiny) and no need for shiny server. There are drawbacks to this as well, but it's quick to market and simple. My example: I teach about dynamic dashboards with flexdashboards and HTML widgets (including crosstalk [for linked brushing], leaflet [maps], DT [for the linked data tables] ; Plotly [because ggplotly function turns any ggplot object into an interactive vis]. Anyway, my thinking is that I can often live without Shiny But the goal of presenting interactive charts and maps via dashboards is still important and I think easier via HTMLwidgets. There is a question of scale and optimization that can be brought into this discussion (or ignored ;-). Hope that makes sense.

On Tue, Jun 18, 2019 at 3:09 AM Stéphane Guillou notifications@github.com wrote:

Creating a webapp with Shiny to publish an interactive (e.g. plotly) visualisation of stats (too much cognitive load maybe?)

Deal with text: use regular expressions (with stringr for example).

Play with APIs, which is a common use-case of R for accessing reference databases. For example with Web of Science: https://github.com/juba/rwos https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_juba_rwos&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GzJFj2K_TA2vdAdJOleIhUWB5To2gqJTyCKj-_QNqg8&m=jZ72MgSe1tNki_kLKBxBi6KVN9aY13e9ile3qZKvcsc&s=Yp1X2DXbtGJPa1Lidc9M_JdCtWV9Uz2SgJgLEqJ8JbA&e=

I think we can thread a good story through all those concepts: acquiring data (import file and/or use an API) -> making it tidy -> do some stats -> visualise it -> create an rmarkdown report -> publish with shiny.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_LibraryCarpentry_lc-2Dr_issues_1-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DABBIEKXOQSI7FMYMIWV6EQ3P3CC3VA5CNFSM4HRRDGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5NFNI-23issuecomment-2D502977205&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GzJFj2K_TA2vdAdJOleIhUWB5To2gqJTyCKj-_QNqg8&m=jZ72MgSe1tNki_kLKBxBi6KVN9aY13e9ile3qZKvcsc&s=nVK1H1wy9z1IHmJpSteMdUt-TB8TBY8b8Y7ue2qFzj0&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABBIEKWF5LLUPMQZ3CAF6YDP3CC3VANCNFSM4HRRDGBA&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GzJFj2K_TA2vdAdJOleIhUWB5To2gqJTyCKj-_QNqg8&m=jZ72MgSe1tNki_kLKBxBi6KVN9aY13e9ile3qZKvcsc&s=HM3dHs6HxH3FzvmltuLk14Sn_zCPCUsNEjX4zRO33iA&e= .

-- John.Little@Duke.edu Data Science Librarian Data Visualization Services https://library.duke.edu/data/ - Duke Libraries https://JohnLittle.info/ https://johnlittle.info/ (919) 627-7795

jt14den commented 5 years ago

@libjohn @stragu all this sounds good and ambitious :). Should we try to set up a call for next steps in chunking it up and come up with a plan? @libjohn my experience with the cross-ref data is that it doesn't lend well to doing basic viz in the form we have it in the OR lesson, but would love to see what you did with it in R? Can you share? I've used LA public library circ data for a ggplot2 & shiny lesson: https://ucla-data-archive.github.io/elag2018-shiny/. I've wanted to refactor into carpentries style, but haven't had time. I'm open for using whatever dataset as long as we can accomplish our goals and teach basic data & stats lit.

libcce commented 5 years ago

There might be some additional ideas here: https://libraryassessment.org/program/

libjohn commented 5 years ago

@jt14den : I'm happy to share the repo I developed to document my exploration...

Let me preface by noting my repo are notes to myself, based on my initial exploration and a conversation with Chris. (So, all the bad grammar, poor spelling, and off the cuff thinking is there to jog my memory only.) When @stragu 's post that came through my email, I was riffing off of that repo/personal-thinking from a few weeks ago. Although, I'm sure I put in too many ideas for one workshop. But all this is a long winded way of saying if you have a process in mind, yes I'm happy to join a call that moves towards next steps and developing a workshop. Let's see what fits.

In direct answer to your question about how I visualized the crossref data, maybe The PDF document in that repo will give the quickest glimpse. Scroll way down to see the charts and graphs.

I should say, I'm rather agnostic on the value of the crossref data for learning R/Tidyverse. My purpose for using that data was only to familiarize myself with a successful Library Carpentry's module. OpenRefine seemed like a good hand hold for that. Alternatively, see my README (again, notes to myself) as perhaps a better example of how I was trying to think through the issue of an R workshop for librarians, and how to organize my thinking. Maybe there is something transferable in all of this.

But, bottom line, like you I don't actually feel all that strongly about the crossref data. In fact, your idea of using circ data potentially generates broader interest and is maybe more inline with the link @libcce sent about the upcoming assessment conference. Looking through that Assessment conference program, I see those folks intend to analyze circ data as well -- only using Tableau.

libjohn commented 5 years ago

Hey @stragu & @jt14den .

I'm writing to find out if you remain interested in jump starting this effort on intro R. @jt14den previously suggested a call and maybe there is still some enthusiasm for this?

If you are interested, I can launch a zoom next Wednesday (US - July 17) / Thursday ( AT - July 18): 6pm my time, 8am Brisbane, & 3pm LA).

Please give me a quick reply if you're interested or want to offer an alternative time/approach. Totally open to other options. Very interested to chat more with either of you. @libcce also.

Since my last post I've been slowly plugging away -- exploring datacarpentry/r-socialsci, took a closer look at Tim's work visualizing LA Public Library circ data (very nice), looked at some San Francisco Public Library circ data available on data.gov, and am also recently inspired by Mine Cetinkaya-Rundel's course design pedagogy, Data Science in a Box. All my notes and scribbles are in my repo.

pmarrai commented 5 years ago

Hi!

I don't know if there has been much movement around the introduction to R lesson for the Library Carpentry in the past months, but I think one package that should be addressed here is bibliometrix. The activities to develop with this package can go together with other points mentioned before in this A possible path:

Acquire a dataset: you have already mentioned Crossref API, perhaps also DataCite API and Dimensions API. These are all APIs interesting for the library world. Otherwise, Web of Science can be an alternative to get a dataset (not a big fan of it though, since it's private and has a coverage bias).
Make sure that the data set is in a .bib format to use it correctly with bibliometrix
Prepare bibliometrix to read the dataset/s
Make some visualisations / use bibliometrix for co-citation analysis.

I hope this makes sense :-)

Best,

Paloma

libcce commented 5 years ago

I’ve been hoping for a lesson around bibliometrics @pmarrai 😀 We definitely need a lead on lesson dev. I think it’s been difficult for anyone to find the time to move it forward.

ciakovx commented 5 years ago

Great ideas here @pmarrai https://github.com/pmarrai. I have some content I’ve been getting cleaned up based on this workshop https://ciakovx.github.io/fsci_syllabus.html. I’ll try and get it posted here in the next couple weeks and we can dig into it.

I haven’t used bibliometrix. Another useful package is Citicorp https://ropensci.org/technotes/2019/09/17/citecorp/

I also have been using rromeo lately with researcher CVs to look ups hat they can deposit in the IR. https://ropensci.github.io/rromeo/articles/rromeo.html

Lots of potential here!

Clarke

On Thu, Oct 24, 2019 at 05:41 Chris Erdmann notifications@github.com wrote:

I’ve been hoping for a lesson around bibliometrics @pmarrai https://github.com/pmarrai 😀 We definitely need a lead on lesson dev. I think it’s been difficult for anyone to find the time to move it forward.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LibraryCarpentry/lc-r/issues/1?email_source=notifications&email_token=ABTDILXOAZEFNSVQXVVIVIDQQF3UHA5CNFSM4HRRDGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECESLZY#issuecomment-545859047, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTDILQBVUUNYQ5XWJV262LQQF3UHANCNFSM4HRRDGBA .

pmarrai commented 5 years ago

Great! Then I'd be glad to participate if you all move it forward :-)

libjohn commented 5 years ago

Hi all, I just ran across across the Official "Mastering the Tidyverse" repo (includes slides and Rmd files) that Rstudio makes available as part of their certified training -- in other places of their certification notes they reference contributions by the Carpentries. Maybe the Mastering repo will inspire you as it is inspiring me. Always great to see the work of an accomplished instructor like Garrett Grolmund. Check it out https://github.com/rstudio/master-the-tidyverse

Good work and good luck

John

jt14den commented 2 years ago

Closing this issue.

LibraryCarpentry / lc-r

Objectives for the R lesson #1