Objectives of the workshop

As @pedrohbraga mentioned to me, here are the objectives of the workshop:

Generally, this presentation would ideally have to be more aligned with the objectives of the workshop so participants can feel that they are confident with:

[ ] (I) independently beginning to use R and RStudio;
[ ] (ii) identifying, inspecting, creating and being able to distinguish between vectors, matrices, data frames, arrays and lists;
[ ] (iii) performing basic data manipulation tasks (with square brackets, etc.) and data operations;
[ ] (iv) creating graphics;
[ ] (v) using the environment, the console and script windows;
[ ] (vi) reading and writing files to the disk (not necessarily in this order and not necessarily limited to this). As you can see, the current format of the workshop needs to be worked to better fit these learning objectives (as of 19 October 2020).

Basically, what I would do is: prioritize

[x] (i) spending a short amount of time planning on what each section of the workshop should be and what changes should be made, what should be completely reframed and what should be kept;
[ ] (ii) begin by adapting and fixing the major issues of this workshop, so it becomes presentable; and
[ ] (iii) only after this, add minor changes.

Pedagogical restucturing of the workshop:

Overall presentation of the workshops (not limited to workshop 1)

[ ] QCBS R workshop table of content: Explain how the workshops (all 10 of them) were designed to be easy and pragmatic for non-programmers and was specifically designed for biologically oriented data analysis (but see disclaimer)
- [ ] Add a table of content of the workshops to make the workshops integrates with one another.
[ ] Learning objectives: State the learning objectives at the beginning and at the end of the workshop (or what attendees of the workshop should get out at the end of the workshop).
[ ] Active learning strategies Add a slide with a series of questions to ask to questions to students:
- [ ] Questions could be about:
- [ ] 1. their name,
- [ ] 2. their university and their project (maybe supervisor),
- [ ] 3. what they are looking for to learn into the workshop series, and
- [ ] 4. encourage them in their project and try to find what, into the workshop series, fits their project.
- [ ] Maybe add a note into the presenters' note to say that the teacher should try to direct the relevant resources to the student (big maybe).
- [ ] Perhaps, encourage the teachers to look ahead and take a quick glance at all the other workshops to encourage the students to take the right workshop, tailored to what they are looking for.
- [ ] for active learning, I'd trash a lot of the challenges and make them more interesting. For example, I'd be more willing to encourage students to take a look at the datasets that are present already in R such as the iris dataset (perhaps use head(iris), subset it iris[,"Species"], plot something plot(Sepal.Length ~Sepal.Width, data = iris, pch =20, xlab = "Sepal Width", ylab = "Sepal Length", main = "Iris data plot", cex = Petal.Width, col = Species) or something more "statistics like" hist(rnorm(1000))). Using more concrete examples, and something that would make them want to save data, you could show how to save 1 object only with saveRDS() or a list of things using save(). Show them how to read this back in R. Explain how the saving process will enable them to run their analysis in steps. 1. Run a bunch of things, save. 2. Start a new script, run a bunch of things that need the first script, but this time just reusing the saved object. I would also advocate on how to "source" another script that contains a function in which they would want to reuse later. For example, create a small function that you put into a script: call it QCBS.funny.R. Then source the script to load the QCBS.funny.R function and use it once in the workshop. It could be a "mode" function that calculates the mode of a vector.
```
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
```
[ ] Go through the RStudio panels to make sure they understand what they have in front of them (1. Source (Script) 2. Console 3. Environment, history, connections 4. Files, plot, packages, Help, viewer). You could even show them how to make it look dark (into the Appearance tab.
[ ] (Maybe) Add at the beginning (or at the end like it is at the moment) that there are many other resources if people are interested in to go further. Especially the Intro to R by W. N. Venables, D. M. Smith and the R Core Team is a deeper look into the programming environment that is R
[ ] Design:
- [ ] please remove the image at the beginning of the workshop, (the image with the people and the swirly things, it refers to the old website design). The title slide cleaner and use only the QCBS logo with the title of the workshop.
- [ ] Clean the "About this workshop" slide. If you want to keep it, only put the "wiki", "Slides", "Slides" and "Script". Also, rename the "Slides", "Slides" badges so that they make more sense to someone less familiar with the icons.
- [ ] Add a survey at the end of the workshop so that students can input their ideas on how to improve the workshops. This was in the previous workshop and was helpful for future revamping sessions. Also, there might be a survey in the R coordinator drive that could be put there (https://drive.google.com/drive/folders/1Jt314Xb617Yw-Jkrrjlhl4MKhnKVVbeQ?usp=sharing).
[ ] Other tutorials: Invite students that want to learn stats and R to see resources such as the swirl (see here) package and its website.
[ ] Invite students that want to use their data at the end of the workshop, to be prepared to use them effectively. Perhaps, the workshop of the introduction should be shorter and the teachers should spend some time with people who want to look at their data and see how they can prepare better for the future workshops.

Statistics

[ ] Global objectives of the workshops: Question to @pedrohbraga: Are the workshops geared toward learning statistics or only learn R? What is the assumption or the prior requirements that students should have before coming to the QCBS R workshops?
- [ ] Add a "Dislcaimer" that would/could sound like this: The QCBS R workshops don't necessarily assume that you have a statistical background. But keep in mind that to make better decisions about the analyses you are planning to do, a background in statistics or experimental design is highly recommended. We do not pretend to solve all of your questions in how to analyse your particular dataset, but guide you through the process of analysis, interpretation and presentation of your data. We therefore assume that the data collection is going to be made or was made with a thoughtful process of data analysis, interpretation and presentation in mind. These workshops are designed to help you do the analysis, interpretation and presentation in R, but your experimental design is key to how the data will be analyzed, interpreted and presented. We highly recommend that you take a look at books about experimental design in you particular field of study and how the data collected is usually analyzed. (Maybe 1. Experimental Design and Data Analysis for Biologists, 2. An Introduction To Experimental Design And Statistics For Biology, or 3. Design and Analysis of Experiments with R or even take a look at this: https://stats.stackexchange.com/questions/1815/recommended-books-on-experiment-design)
[ ] Purpose of statistics: State that statistics is roughly about 1. collecting (or simulating), 2. analyzing, 3. interpreting, and 4. effectively presenting you study and data. R is a tool to help you gain the insights about data you collect and test ideas with data that you can simulate! And it makes it available to you. Probability theory provides a crucial foundation for statistical thinking and reasoning (just look at \alpha, P-values, etc.).
[ ] What R changed and responsibilities: Explain how before easy access to personal computers and free and accessible programs like R, scientists would need a trained statistical to test the assumptions of our models or know what are the preferred parameters values to draw certain types of graphs or explore the data to understand if they respect the model assumption. R makes it accessible to much more scientists and other institutions (governments, municipalities, NGOs and more) to do the basic tests needed to carry their research and allow them to do much more complex analysis that was out of reach before. This also comes with greater responsibilities in understanding, on your part, for example what statistical assumptions are underlying a t-test, a linear model, a PCA, a particular type of GLMM, etc.

Why R

[ ] Limits or R: I don't agree that R is good for everything. Perhaps, after stating why R is too cool, state what are the limits to R (e.g., R can do spatial analysis, but if someone wants to do heavy work on map making and GIS analysis, QGIS is much better for that purpose [and there is a training on that at QCBS https://wiki.qcbs.ca/introgis]). Data literacy should be also encouraged as well as research integrity. R will not solve that in itself. Not sure if this is clear, but this could be added in the presenter's notes: "R is an open-source set of tools that has now been one of the world's leading statistical programming environment with a well developed community".
[ ] R integration with other softwares: Add a note that R can easily be integrated with other softwares (e.g. JAGS, Stan (see reference here) , QGIS (see reference here) ). You'll still need to probably use other softwares (like Excel) to write your data in. Make sure that you follow the proper data entry principles. R works also best when you prepare your work environment before (create the proper folder structure). Integration with python. You can interact with softwares containing data (SQL, Excel, etc.)
[ ] State that R makes it easier to document the steps of you analysis and even write dynamic reports (perhaps your thesis!). Also, you can correct and rerun you work (without going through menus and clicking around to make your analysis work). It allows you to write your own functions and build up a portfolio of functions that you can use to ease your analysis.

Content

[ ] Script headerUpdate header or the R script (especially date of modification)
[x] Script sections Add sections using "CMD+SHIFT+R" (on Mac) and [I think...] CTRL + SHIFT + R (on Windows). This will allow students to see the outline of the R script with the "Show Document Outline" option in RStudio.
[ ] Comments Explain within the R script, how to add a comment (with #).
[ ] Teaching breaks Add Slides to make a break into the presentation. Add a note into the slide that the teacher should use the break time.
[ ] Inspiration: take a look at the "What is the most useful R trick?" to inform the functions that should be learnt in the 1st workshop.
[x] Built-in in R: Add a note that R comes with built in constants (such as pi , LETTERS, letters, month.abb, month.name, see this reference)
[ ] Resources:Encourage students to go to Stack Overflow. Especially, encourage them to ask questions with reproducible examples and ask better questions in general on Stack Overflow. Perhaps not the best example out there, but here is one question that I asked.
- [ ] state that in your lab somewhere should be a copy of Crawley, M. J. 2013. The R book. Pages 1–1051. Second edition. John Wiley & Sons, Ltd, United Kingdom.!
[ ] Revision of the Challenges:
- [ ] The pace of the challenges should be revised.
- [ ] It is not clear in the R script when the challenge ends. There could be a "end of challenge" tag or division to make this clear.
- [ ] Add a title to the challenges in the R script
- [ ] In general, challenges should include more comments about what is happening.
- [ ] In the script, add all the challenges (e.g., challenge 1 is not there, but state in the script that they were able to read the message in the challenge)
- [ ] Don't spend too much time on the "R as a calculator". This should take less than a minute. Perhaps, show to people how to make the appropriate symbols (there are always moments where people don't know how to make a +; -; *; /; ^; <; ~; []; (), {}, #, etc., see this page or imply type ??base::Syntax) Perhaps to fix this, tell them to try to make sure their keyboard is configured to be in english if possible. Otherwise, spend some time to show them how to find the characters on their keyboards. Also, explain what 1:10 means.
- [ ] Make sure that they know that there is a help to find keyboard shortcut in RStudio: (On Mac) "Tools" > "Keyboard Shortcuts Help" OR "ALT + SHIFT + K".
- [ ] Combine challenge 2 and 3
- [ ] Challenge 4: tell students that there exists preloaded values in R before the exercice. Perhaps, change the challenge (or provide the area of the circle formula before hand and state that there are preloaded values in R like Pi) to something like: calculate the mean out of a vector of numbers with sum() and mean()
- [ ] Challenge 5: use a more interesting number (in biology, nobody cares about Euler's number). Perhaps, just result = 2+2
- [ ] Challenge 6: Add a line stating that the number can be put elsewhere in the name (e.g. result.8 or result_8_times or result8)
- [ ] Challenge 7: Why is there a structure(c(1, 3, 5, 7)) in challenge 7? Remove that.
- [ ] Challenge 8: Make sure to explain (probably in presenter's notes) that the type of subsetting (using [] or [,]) depends on the number of dimensions
- [ ] Challenge 9: It would be nice if the examples vectors (such as char_vector) are put back into the challenges. E.g., rewrite char_vector <- c("blue", "red", "green") into challenge 9 so that students who haven't run the line assigning the character values can still do the challenge. In my teacher's experience, it often happens that students were distracted or forgot to run the line of code 3 challenges before the one they are doing and now can't do the challenge because they have to find back the information needed to do the challenge.
- [ ] Challenge 10: Another (preferred) possibility is my_df[, "num_sp"]. The naming system for the object is pretty terrible. The workshop should contain a naming system that encourage students to continue doing. my_df should probably bee ore fertilizer.data (just thought about that, but perhaps explain that the .data is not as a .pdf, it is just part of the name of the object). Also, make num_sp more interesting by putting species names like c("Statisticus revolusus", "Biodivertidus canadensis", "Statisticus revolusus", "Biodivertidus canadensis") and naming the vector species_names. Why would one want to multiple "species number"? Perhaps make a new column called abundance <- c(17, 23, 15, 7) and make the num_vector called plot_weights or sampling effort or something to show that could be biologically relevant.
- [ ] Challenge 11: I don't see that the operator : was introduced. This should be added.
- [ ] Challenge 12: The title is not present into the script. Add that.
- [ ] Challenge 13: The challenge and the output don't match. Explain that the challenge is about creating a sequence (not with :, but with seq(), that increments by 2)
- [ ] Challenge 14: This challenge seems boring. Perhaps show them a function where there is a "default value is provided" that is not the one that we want as default. Therefore, students would have to dig the help page to find how to set the appropriate value. For example, TukeyHSD() has a default value for the confidence interval of 95% conf.level = 0.95. Let's say we want 97.5% or 99%. Something like that.
[x] R History: use the keyboard arrows to navigate the history of the commands that were done. If you use in RStudio on Mac the shortcut "CMD + arrow-up", you'll see a more of that history at a glance.
[x] Cool graphs Add a "show off how R is versatile" image:
[ ] Simulating data: Perhaps state that R is nice for simulating data and testing statistical ideas, or experimental designs, or play with data from the literature (that'd be nice also to add that into another future workshop or create one anew). Maybe show the t-distribution compared to the normal distribution. Example (perhaps to complicated, but you get the idea):
```
# Sequence of number to draw the distributions
sequ = seq(-4,4, length.out = 100)
```

Normal distribution

plot(sequ,y = dnorm(sequ), type = "l", ylim = c(0,1), main = "Normal and t-distributions", ylab = "Density")

Area of shade (from https://www.r-bloggers.com/2012/06/shading-regions-of-the-normal-the-stanine-scale/)

alpha = 0.10 from.z.low <- -100 to.z.low <- qnorm(alpha/2) from.z.up <- qnorm(1-(alpha/2)) to.z.up <- 100

s.x.low <- c(from.z.low, seq(from.z.low, to.z.low, 0.01), to.z.low) s.y.low <- c(0, dnorm(seq(from.z.low, to.z.low, 0.01)), 0) polygon(x = s.x.low, y = s.y.low, col="red")

s.x.up <- c(from.z.up, seq(from.z.up, to.z.up, 0.01), to.z.up) s.y.up <- c(0, dnorm(seq(from.z.up, to.z.up, 0.01)), 0) polygon(x = s.x.up, y = s.y.up, col="red")

T-distribution

df = 1 # Setting the number of degrees of freedom for the t-distribution lines(x = sequ,y = dt(sequ, df), type = "l", col = "red")

Adding a legend to the plot

legend("topright",legend = c("Normal","t-distribution"),col = c(1,2), lty =1)


# Wiki
For all the challenges, see the section **Content** 
- [ ] In the wiki, it says "please contact the current series coordinators, listed on the main wiki page". It would be nice if there was a link to send an email or a link to state where to contact them!
- [ ] Correct:
  - [ ] "R Studio" should be written "RStudio" 
  - [ ] "OS X" should be written "macOS" 
  - [ ] Update the figures to have a consistent style and make them pretty 
  - [ ] Update logos (Apple and Windows logo are archaic) 
- [ ] Object oriented: I'd make the message simpler: "We are going to lean how to store our calculations or our values / output into what we call "objects". 
- [ ] Perhaps see the structure of [this book](https://adv-r.hadley.nz/introduction.html) and especially the "why R" section to make it more interesting. 
- [ ] **R TIP**: talking about variable names, point to the "Environment" in RStudio to make the students realize where are the objects "stored". Also, warn students in naming variables the same name as functions: e.g. `data()` is a function, so calling  `data <- "mydataset"` is not a good idea. It won't rewrite the function, it's just not helping when using "tab" to autocomplete! 
- [ ] Perhaps, state at the beginning of the first code that we like the code to be breathing: meaning that the code is clearer to read when there are spaces in the appropriate places (this is facultative)
- [x] For the challenges (especially challenge 6) make sure that what's in the R script as an example, is within the wiki. 
- [ ] **Type of data structure**: Make a **table** that shows the differences (in terms of dimensions, content, modes, etc.) between a vector (1 dimension which is `length()`, 1 mode [not "list of related values" like it is said in the wiki, this is not super clear what "related" means]), a matrix (nXm-dimension (`dim()`), etc.) This would make it clearer what are the differences. Maybe talk about attributes that can be associated with datasets (`??base::attributes`). Also, make another table of the different types of modes in R and explain how R interprets them (Numeric e.g. `8`, not `"8"`, character e.g. `"potato"` not `potato`, logical e.g. `TRUE`, not `true`, not `"TRUE"` (I would advise to NOT USE `T` or `F`. It causes too many problems and errors), factor... etc.). Talk about the function `numeric()` which is handy to create an empty vector of numbers with `numeric(0)`.
  - [ ] It would be cool to add sub-section titles "vector" "matrix" "data frames". Now it's all in "Types of data structures in R". It would make it clearer. 
- [ ] Not sure why there is something about the function `structure()`. I'd remove it. 
- [ ] explain what `:` means in `1:5`. Same as `seq(from = 1, to = 5, by =1)`. It is explained inside the script but I would move it up into the text or explain how the `seq` function is doing something similar.
- [ ] Again, the table "my_df" is TERRIBLE! There is a column named "# of species" which is a big no no in a course about creation of databases without weird characters in it!!! Also, see my comment that I added in challenge 10. Where it says "# Visualise it!" I'd advocate to have the function `head()`. Perhaps this would force people to learn straight away that head() is much preferred over just typing the data. 
- [ ] **Indexing objects in R**: 
  - [ ] Add "square" brackets. 
  - [ ] State that the c() function is necessary in order to index multiple values one at the time.   
  - [ ] There is a line saying "# There is no sixth value in this vector so R returns a null value (i.e. NA)" This should be corrected as a `NULL` (The Null Object) is not the same as an `NA` (‘Not Available’ / Missing Values). 
  - [ ] In the `odd_n[odd_n > 4]`, I'd make the students explore what `odd_n > 4` first means. I would not put it as a challenge. 
  - [ ] In the `char_vecteur[char_vecteur == "blue"]` example, I'd translate the example, and then add a new line saying `color.vector[color.vector %in% c("blue","red")]`
  - [ ] Remove the \\ from the sentence "\\Here are a few examples of data"...
  - [ ] For matrices, you can specify the my.matrix[1,2,drop=FALSE] so that when you subset, you don't loose the matrix structure. A very neat trick! 
  - [ ] state that the preferred way to subset a data frame, in my opinion is to use the column name as the "ID" to subset. e.g. DON'T use `my.mat[,1]`, but `my.mat[,"species"]` or if you want to remove a column, something like this: `mtcars[,-which(colnames(mtcars)=="drat")]`. Makes the code much more readable (for the moment, since we haven't see `tidyverse`.)
  - [ ] In the "A quick note on logical statements" add the `%in%` which is the "Value Matching" operator or simply `match()`. VERY useful. row.names(mtcars) %in% c("Toyota Corolla","Toyota Corona")
  - [ ] remove the period in `> y2 <- c(1, 2, -7, 4, 5).`
  - [ ] Challenge 10 in wiki: store the statement in an object!! `res <- my_df$num.sp * num.vector[c(1:4)]` then `res > 25`!
  - [ ] Challenge 11 : reword the challenge to make it simpler to understand. 
  - [ ] challenge 12: why not add `type = "l"` inside the plot function? 
  - [ ] I'd advocate for the creation of a table after the sentence "As a reference, here is a list of some of the most common R functions:". that would make it more useful. Something like this: 
![Screen Shot 2020-10-26 at 19 57 18](https://user-images.githubusercontent.com/15717151/97240753-7e4e7800-17c5-11eb-8cfc-7198cd68481c.png)
**packages**
- [ ] add to the sentence"They are usually available through the Comprehensive R Archive Network ", this "but there are now many packages also available on GitHub or other platforms". 
- [ ] Number of packages: Change the information. "As of October 2020, _only from the CRAN_, there are more than 16000 packages!" [Source here](https://cran.r-project.org/web/packages/)
- [ ] explain what is an R session (to understand why you need to call libraries each time you reopen R): It is because the library is loaded into _memory_. 
- [ ] To see the conflicts, use `conflicts(, detail = TRUE)`
- [ ] If you absolutely need to use 2 packages and 1 function is conflicting, you can use the `package::function` trick. VERY HANDY!! Search `::` in the help. 
- [ ] when showing the example of a help menu, encourage students to copy-paste the `Examples` section to see how it works in action. See `seq` for example

seq(0, 1, length.out = 11) seq(stats::rnorm(20)) # effectively 'along' seq(1, 9, by = 2) # matches 'end' seq(1, 9, by = pi) # stays below 'end' seq(1, 6, by = 3) seq(1.575, 5.125, by = 0.05) seq(17) # same as 1:17, or even better seq_len(17)


- [x] in "Getting help on the Web", send the students not to stack exchange, but to [stack overflow](https://stackoverflow.com) for coding related question or [Cross validated](http://stats.stackexchange.com) for statistical questions. 
- [ ] challenge 14 is really boring... 

- [ ] "Some useful books on R" please cite properly and add links to the books: 
e.g. https://www.springer.com/gp/book/9780387790534 
"Crawley, M. - The R Book." is more Crawley, M. J. 2013. The R book. Pages 1–1051. Second edition. John Wiley & Sons, Ltd, United Kingdom.
- [ ] explain why the websites in "Some useful websites" are interesting. 

### Final verification
- [x] Use the R script the wiki and the presentation side-by-side to see what needs to be harmonized
- [x] Make sure that the sections in the presentation, wiki and R script match (each header should be easy to find in each of the different tools)

QCBSRworkshops / workshop01

Issues related to instruction clarity and presentation structure #3

General comments