QCBSRworkshops / workshop01

Workshop 1 - Introduction to R
https://r.qcbs.ca/workshop01/
Other
3 stars 6 forks source link

Issues related to instruction clarity and presentation structure #3

Open pedrohbraga opened 3 years ago

pedrohbraga commented 3 years ago

General comments

Specific

beausoleilmo commented 3 years ago

There is an item in the list that is not clear "Snapshots of RStudio and R should be updated to earlier versions, which include". Could you update the list to make it clearer?

beausoleilmo commented 3 years ago

Objectives of the workshop

As @pedrohbraga mentioned to me, here are the objectives of the workshop:

Generally, this presentation would ideally have to be more aligned with the objectives of the workshop so participants can feel that they are confident with:

Basically, what I would do is: prioritize


Pedagogical restucturing of the workshop:

Overall presentation of the workshops (not limited to workshop 1)

Statistics

Why R

Content

Normal distribution

plot(sequ,y = dnorm(sequ), type = "l", ylim = c(0,1), main = "Normal and t-distributions", ylab = "Density")

Area of shade (from https://www.r-bloggers.com/2012/06/shading-regions-of-the-normal-the-stanine-scale/)

alpha = 0.10 from.z.low <- -100 to.z.low <- qnorm(alpha/2) from.z.up <- qnorm(1-(alpha/2)) to.z.up <- 100

s.x.low <- c(from.z.low, seq(from.z.low, to.z.low, 0.01), to.z.low) s.y.low <- c(0, dnorm(seq(from.z.low, to.z.low, 0.01)), 0) polygon(x = s.x.low, y = s.y.low, col="red")

s.x.up <- c(from.z.up, seq(from.z.up, to.z.up, 0.01), to.z.up) s.y.up <- c(0, dnorm(seq(from.z.up, to.z.up, 0.01)), 0) polygon(x = s.x.up, y = s.y.up, col="red")

T-distribution

df = 1 # Setting the number of degrees of freedom for the t-distribution lines(x = sequ,y = dt(sequ, df), type = "l", col = "red")

Adding a legend to the plot

legend("topright",legend = c("Normal","t-distribution"),col = c(1,2), lty =1)


# Wiki
For all the challenges, see the section **Content** 
- [ ] In the wiki, it says "please contact the current series coordinators, listed on the main wiki page". It would be nice if there was a link to send an email or a link to state where to contact them!
- [ ] Correct:
  - [ ] "R Studio" should be written "RStudio" 
  - [ ] "OS X" should be written "macOS" 
  - [ ] Update the figures to have a consistent style and make them pretty 
  - [ ] Update logos (Apple and Windows logo are archaic) 
- [ ] Object oriented: I'd make the message simpler: "We are going to lean how to store our calculations or our values / output into what we call "objects". 
- [ ] Perhaps see the structure of [this book](https://adv-r.hadley.nz/introduction.html) and especially the "why R" section to make it more interesting. 
- [ ] **R TIP**: talking about variable names, point to the "Environment" in RStudio to make the students realize where are the objects "stored". Also, warn students in naming variables the same name as functions: e.g. `data()` is a function, so calling  `data <- "mydataset"` is not a good idea. It won't rewrite the function, it's just not helping when using "tab" to autocomplete! 
- [ ] Perhaps, state at the beginning of the first code that we like the code to be breathing: meaning that the code is clearer to read when there are spaces in the appropriate places (this is facultative)
- [x] For the challenges (especially challenge 6) make sure that what's in the R script as an example, is within the wiki. 
- [ ] **Type of data structure**: Make a **table** that shows the differences (in terms of dimensions, content, modes, etc.) between a vector (1 dimension which is `length()`, 1 mode [not "list of related values" like it is said in the wiki, this is not super clear what "related" means]), a matrix (nXm-dimension (`dim()`), etc.) This would make it clearer what are the differences. Maybe talk about attributes that can be associated with datasets (`??base::attributes`). Also, make another table of the different types of modes in R and explain how R interprets them (Numeric e.g. `8`, not `"8"`, character e.g. `"potato"` not `potato`, logical e.g. `TRUE`, not `true`, not `"TRUE"` (I would advise to NOT USE `T` or `F`. It causes too many problems and errors), factor... etc.). Talk about the function `numeric()` which is handy to create an empty vector of numbers with `numeric(0)`.
  - [ ] It would be cool to add sub-section titles "vector" "matrix" "data frames". Now it's all in "Types of data structures in R". It would make it clearer. 
- [ ] Not sure why there is something about the function `structure()`. I'd remove it. 
- [ ] explain what `:` means in `1:5`. Same as `seq(from = 1, to = 5, by =1)`. It is explained inside the script but I would move it up into the text or explain how the `seq` function is doing something similar.
- [ ] Again, the table "my_df" is TERRIBLE! There is a column named "# of species" which is a big no no in a course about creation of databases without weird characters in it!!! Also, see my comment that I added in challenge 10. Where it says "# Visualise it!" I'd advocate to have the function `head()`. Perhaps this would force people to learn straight away that head() is much preferred over just typing the data. 
- [ ] **Indexing objects in R**: 
  - [ ] Add "square" brackets. 
  - [ ] State that the c() function is necessary in order to index multiple values one at the time.   
  - [ ] There is a line saying "# There is no sixth value in this vector so R returns a null value (i.e. NA)" This should be corrected as a `NULL` (The Null Object) is not the same as an `NA` (‘Not Available’ / Missing Values). 
  - [ ] In the `odd_n[odd_n > 4]`, I'd make the students explore what `odd_n > 4` first means. I would not put it as a challenge. 
  - [ ] In the `char_vecteur[char_vecteur == "blue"]` example, I'd translate the example, and then add a new line saying `color.vector[color.vector %in% c("blue","red")]`
  - [ ] Remove the \\ from the sentence "\\Here are a few examples of data"...
  - [ ] For matrices, you can specify the my.matrix[1,2,drop=FALSE] so that when you subset, you don't loose the matrix structure. A very neat trick! 
  - [ ] state that the preferred way to subset a data frame, in my opinion is to use the column name as the "ID" to subset. e.g. DON'T use `my.mat[,1]`, but `my.mat[,"species"]` or if you want to remove a column, something like this: `mtcars[,-which(colnames(mtcars)=="drat")]`. Makes the code much more readable (for the moment, since we haven't see `tidyverse`.)
  - [ ] In the "A quick note on logical statements" add the `%in%` which is the "Value Matching" operator or simply `match()`. VERY useful. row.names(mtcars) %in% c("Toyota Corolla","Toyota Corona")
  - [ ] remove the period in `> y2 <- c(1, 2, -7, 4, 5).`
  - [ ] Challenge 10 in wiki: store the statement in an object!! `res <- my_df$num.sp * num.vector[c(1:4)]` then `res > 25`!
  - [ ] Challenge 11 : reword the challenge to make it simpler to understand. 
  - [ ] challenge 12: why not add `type = "l"` inside the plot function? 
  - [ ] I'd advocate for the creation of a table after the sentence "As a reference, here is a list of some of the most common R functions:". that would make it more useful. Something like this: 
![Screen Shot 2020-10-26 at 19 57 18](https://user-images.githubusercontent.com/15717151/97240753-7e4e7800-17c5-11eb-8cfc-7198cd68481c.png)
**packages**
- [ ] add to the sentence"They are usually available through the Comprehensive R Archive Network ", this "but there are now many packages also available on GitHub or other platforms". 
- [ ] Number of packages: Change the information. "As of October 2020, _only from the CRAN_, there are more than 16000 packages!" [Source here](https://cran.r-project.org/web/packages/)
- [ ] explain what is an R session (to understand why you need to call libraries each time you reopen R): It is because the library is loaded into _memory_. 
- [ ] To see the conflicts, use `conflicts(, detail = TRUE)`
- [ ] If you absolutely need to use 2 packages and 1 function is conflicting, you can use the `package::function` trick. VERY HANDY!! Search `::` in the help. 
- [ ] when showing the example of a help menu, encourage students to copy-paste the `Examples` section to see how it works in action. See `seq` for example 

seq(0, 1, length.out = 11) seq(stats::rnorm(20)) # effectively 'along' seq(1, 9, by = 2) # matches 'end' seq(1, 9, by = pi) # stays below 'end' seq(1, 6, by = 3) seq(1.575, 5.125, by = 0.05) seq(17) # same as 1:17, or even better seq_len(17)


- [x] in "Getting help on the Web", send the students not to stack exchange, but to [stack overflow](https://stackoverflow.com) for coding related question or [Cross validated](http://stats.stackexchange.com) for statistical questions. 
- [ ] challenge 14 is really boring... 

- [ ] "Some useful books on R" please cite properly and add links to the books: 
e.g. https://www.springer.com/gp/book/9780387790534 
"Crawley, M. - The R Book." is more Crawley, M. J. 2013. The R book. Pages 1–1051. Second edition. John Wiley & Sons, Ltd, United Kingdom.
- [ ] explain why the websites in "Some useful websites" are interesting. 

### Final verification
- [x] Use the R script the wiki and the presentation side-by-side to see what needs to be harmonized
- [x] Make sure that the sections in the presentation, wiki and R script match (each header should be easy to find in each of the different tools)