Open pedrohbraga opened 3 years ago
There is an item in the list that is not clear "Snapshots of RStudio and R should be updated to earlier versions, which include". Could you update the list to make it clearer?
As @pedrohbraga mentioned to me, here are the objectives of the workshop:
Generally, this presentation would ideally have to be more aligned with the objectives of the workshop so participants can feel that they are confident with:
Basically, what I would do is: prioritize
Overall presentation of the workshops (not limited to workshop 1)
[ ] QCBS R workshop table of content: Explain how the workshops (all 10 of them) were designed to be easy and pragmatic for non-programmers and was specifically designed for biologically oriented data analysis (but see disclaimer)
[ ] Learning objectives: State the learning objectives at the beginning and at the end of the workshop (or what attendees of the workshop should get out at the end of the workshop).
[ ] Active learning strategies Add a slide with a series of questions to ask to questions to students:
head(iris)
, subset it iris[,"Species"]
, plot something plot(Sepal.Length ~Sepal.Width, data = iris, pch =20, xlab = "Sepal Width", ylab = "Sepal Length", main = "Iris data plot", cex = Petal.Width, col = Species)
or something more "statistics like" hist(rnorm(1000))
). Using more concrete examples, and something that would make them want to save data, you could show how to save 1 object only with saveRDS()
or a list of things using save()
. Show them how to read this back in R. Explain how the saving process will enable them to run their analysis in steps. 1. Run a bunch of things, save. 2. Start a new script, run a bunch of things that need the first script, but this time just reusing the saved object. I would also advocate on how to "source" another script that contains a function in which they would want to reuse later. For example, create a small function that you put into a script: call it QCBS.funny.R
. Then source the script to load the QCBS.funny.R
function and use it once in the workshop. It could be a "mode" function that calculates the mode of a vector.
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
[ ] Go through the RStudio panels to make sure they understand what they have in front of them (1. Source (Script) 2. Console 3. Environment, history, connections 4. Files, plot, packages, Help, viewer). You could even show them how to make it look dark (into the Appearance
tab.
[ ] (Maybe) Add at the beginning (or at the end like it is at the moment) that there are many other resources if people are interested in to go further. Especially the Intro to R by W. N. Venables, D. M. Smith and the R Core Team is a deeper look into the programming environment that is R
[ ] Design:
[ ] Other tutorials: Invite students that want to learn stats and R to see resources such as the swirl
(see here) package and its website.
[ ] Invite students that want to use their data at the end of the workshop, to be prepared to use them effectively. Perhaps, the workshop of the introduction should be shorter and the teachers should spend some time with people who want to look at their data and see how they can prepare better for the future workshops.
Statistics
Why R
JAGS
, Stan
(see reference here) , QGIS
(see reference here) ). You'll still need to probably use other softwares (like Excel) to write your data in. Make sure that you follow the proper data entry principles. R works also best when you prepare your work environment before (create the proper folder structure). Integration with python. You can interact with softwares containing data (SQL, Excel, etc.)Content
[ ] Script headerUpdate header or the R script (especially date of modification)
[x] Script sections Add sections using "CMD+SHIFT+R
" (on Mac) and [I think...] CTRL + SHIFT + R
(on Windows). This will allow students to see the outline of the R script with the "Show Document Outline" option in RStudio.
[ ] Comments Explain within the R script, how to add a comment (with #
).
[ ] Teaching breaks Add Slides to make a break into the presentation. Add a note into the slide that the teacher should use the break time.
[ ] Inspiration: take a look at the "What is the most useful R trick?" to inform the functions that should be learnt in the 1st workshop.
[x] Built-in in R: Add a note that R comes with built in constants (such as pi
, LETTERS
, letters
, month.abb
, month.name
, see this reference)
[ ] Resources:Encourage students to go to Stack Overflow. Especially, encourage them to ask questions with reproducible examples and ask better questions in general on Stack Overflow. Perhaps not the best example out there, but here is one question that I asked.
[ ] Revision of the Challenges:
comments
about what is happening.+
; -
; *
; /
; ^
; <
; ~
; []
; ()
, {}
, #
, etc., see this page or imply type ??base::Syntax
) Perhaps to fix this, tell them to try to make sure their keyboard is configured to be in english if possible. Otherwise, spend some time to show them how to find the characters on their keyboards. Also, explain what 1:10
means.ALT + SHIFT + K
".sum()
and mean()
result = 2+2
result.8
or result_8_times
or result8
)structure(c(1, 3, 5, 7)
) in challenge 7? Remove that.[]
or [,]
) depends on the number of dimensionschar_vector
) are put back into the challenges. E.g., rewrite char_vector <- c("blue", "red", "green")
into challenge 9 so that students who haven't run the line assigning the character values can still do the challenge. In my teacher's experience, it often happens that students were distracted or forgot to run the line of code 3 challenges before the one they are doing and now can't do the challenge because they have to find back the information needed to do the challenge.my_df[, "num_sp"]
. The naming system for the object is pretty terrible. The workshop should contain a naming system that encourage students to continue doing. my_df
should probably bee ore fertilizer.data
(just thought about that, but perhaps explain that the .data
is not as a .pdf
, it is just part of the name of the object). Also, make num_sp
more interesting by putting species names like c("Statisticus revolusus", "Biodivertidus canadensis", "Statisticus revolusus", "Biodivertidus canadensis")
and naming the vector species_names
. Why would one want to multiple "species number"? Perhaps make a new column called abundance <- c(17, 23, 15, 7)
and make the num_vector
called plot_weights
or sampling effort
or something to show that could be biologically relevant.:
was introduced. This should be added.:
, but with seq()
, that increments by 2
)TukeyHSD()
has a default value for the confidence interval of 95% conf.level = 0.95
. Let's say we want 97.5% or 99%. Something like that.[x] R History: use the keyboard arrows to navigate the history of the commands that were done. If you use in RStudio on Mac the shortcut "CMD + arrow-up
", you'll see a more of that history at a glance.
[x] Cool graphs Add a "show off how R is versatile" image:
[ ] Simulating data: Perhaps state that R is nice for simulating data and testing statistical ideas, or experimental designs, or play with data from the literature (that'd be nice also to add that into another future workshop or create one anew). Maybe show the t-distribution compared to the normal distribution. Example (perhaps to complicated, but you get the idea):
# Sequence of number to draw the distributions
sequ = seq(-4,4, length.out = 100)
plot(sequ,y = dnorm(sequ), type = "l", ylim = c(0,1), main = "Normal and t-distributions", ylab = "Density")
alpha = 0.10 from.z.low <- -100 to.z.low <- qnorm(alpha/2) from.z.up <- qnorm(1-(alpha/2)) to.z.up <- 100
s.x.low <- c(from.z.low, seq(from.z.low, to.z.low, 0.01), to.z.low) s.y.low <- c(0, dnorm(seq(from.z.low, to.z.low, 0.01)), 0) polygon(x = s.x.low, y = s.y.low, col="red")
s.x.up <- c(from.z.up, seq(from.z.up, to.z.up, 0.01), to.z.up) s.y.up <- c(0, dnorm(seq(from.z.up, to.z.up, 0.01)), 0) polygon(x = s.x.up, y = s.y.up, col="red")
df = 1 # Setting the number of degrees of freedom for the t-distribution lines(x = sequ,y = dt(sequ, df), type = "l", col = "red")
legend("topright",legend = c("Normal","t-distribution"),col = c(1,2), lty =1)
# Wiki
For all the challenges, see the section **Content**
- [ ] In the wiki, it says "please contact the current series coordinators, listed on the main wiki page". It would be nice if there was a link to send an email or a link to state where to contact them!
- [ ] Correct:
- [ ] "R Studio" should be written "RStudio"
- [ ] "OS X" should be written "macOS"
- [ ] Update the figures to have a consistent style and make them pretty
- [ ] Update logos (Apple and Windows logo are archaic)
- [ ] Object oriented: I'd make the message simpler: "We are going to lean how to store our calculations or our values / output into what we call "objects".
- [ ] Perhaps see the structure of [this book](https://adv-r.hadley.nz/introduction.html) and especially the "why R" section to make it more interesting.
- [ ] **R TIP**: talking about variable names, point to the "Environment" in RStudio to make the students realize where are the objects "stored". Also, warn students in naming variables the same name as functions: e.g. `data()` is a function, so calling `data <- "mydataset"` is not a good idea. It won't rewrite the function, it's just not helping when using "tab" to autocomplete!
- [ ] Perhaps, state at the beginning of the first code that we like the code to be breathing: meaning that the code is clearer to read when there are spaces in the appropriate places (this is facultative)
- [x] For the challenges (especially challenge 6) make sure that what's in the R script as an example, is within the wiki.
- [ ] **Type of data structure**: Make a **table** that shows the differences (in terms of dimensions, content, modes, etc.) between a vector (1 dimension which is `length()`, 1 mode [not "list of related values" like it is said in the wiki, this is not super clear what "related" means]), a matrix (nXm-dimension (`dim()`), etc.) This would make it clearer what are the differences. Maybe talk about attributes that can be associated with datasets (`??base::attributes`). Also, make another table of the different types of modes in R and explain how R interprets them (Numeric e.g. `8`, not `"8"`, character e.g. `"potato"` not `potato`, logical e.g. `TRUE`, not `true`, not `"TRUE"` (I would advise to NOT USE `T` or `F`. It causes too many problems and errors), factor... etc.). Talk about the function `numeric()` which is handy to create an empty vector of numbers with `numeric(0)`.
- [ ] It would be cool to add sub-section titles "vector" "matrix" "data frames". Now it's all in "Types of data structures in R". It would make it clearer.
- [ ] Not sure why there is something about the function `structure()`. I'd remove it.
- [ ] explain what `:` means in `1:5`. Same as `seq(from = 1, to = 5, by =1)`. It is explained inside the script but I would move it up into the text or explain how the `seq` function is doing something similar.
- [ ] Again, the table "my_df" is TERRIBLE! There is a column named "# of species" which is a big no no in a course about creation of databases without weird characters in it!!! Also, see my comment that I added in challenge 10. Where it says "# Visualise it!" I'd advocate to have the function `head()`. Perhaps this would force people to learn straight away that head() is much preferred over just typing the data.
- [ ] **Indexing objects in R**:
- [ ] Add "square" brackets.
- [ ] State that the c() function is necessary in order to index multiple values one at the time.
- [ ] There is a line saying "# There is no sixth value in this vector so R returns a null value (i.e. NA)" This should be corrected as a `NULL` (The Null Object) is not the same as an `NA` (‘Not Available’ / Missing Values).
- [ ] In the `odd_n[odd_n > 4]`, I'd make the students explore what `odd_n > 4` first means. I would not put it as a challenge.
- [ ] In the `char_vecteur[char_vecteur == "blue"]` example, I'd translate the example, and then add a new line saying `color.vector[color.vector %in% c("blue","red")]`
- [ ] Remove the \\ from the sentence "\\Here are a few examples of data"...
- [ ] For matrices, you can specify the my.matrix[1,2,drop=FALSE] so that when you subset, you don't loose the matrix structure. A very neat trick!
- [ ] state that the preferred way to subset a data frame, in my opinion is to use the column name as the "ID" to subset. e.g. DON'T use `my.mat[,1]`, but `my.mat[,"species"]` or if you want to remove a column, something like this: `mtcars[,-which(colnames(mtcars)=="drat")]`. Makes the code much more readable (for the moment, since we haven't see `tidyverse`.)
- [ ] In the "A quick note on logical statements" add the `%in%` which is the "Value Matching" operator or simply `match()`. VERY useful. row.names(mtcars) %in% c("Toyota Corolla","Toyota Corona")
- [ ] remove the period in `> y2 <- c(1, 2, -7, 4, 5).`
- [ ] Challenge 10 in wiki: store the statement in an object!! `res <- my_df$num.sp * num.vector[c(1:4)]` then `res > 25`!
- [ ] Challenge 11 : reword the challenge to make it simpler to understand.
- [ ] challenge 12: why not add `type = "l"` inside the plot function?
- [ ] I'd advocate for the creation of a table after the sentence "As a reference, here is a list of some of the most common R functions:". that would make it more useful. Something like this:
![Screen Shot 2020-10-26 at 19 57 18](https://user-images.githubusercontent.com/15717151/97240753-7e4e7800-17c5-11eb-8cfc-7198cd68481c.png)
**packages**
- [ ] add to the sentence"They are usually available through the Comprehensive R Archive Network ", this "but there are now many packages also available on GitHub or other platforms".
- [ ] Number of packages: Change the information. "As of October 2020, _only from the CRAN_, there are more than 16000 packages!" [Source here](https://cran.r-project.org/web/packages/)
- [ ] explain what is an R session (to understand why you need to call libraries each time you reopen R): It is because the library is loaded into _memory_.
- [ ] To see the conflicts, use `conflicts(, detail = TRUE)`
- [ ] If you absolutely need to use 2 packages and 1 function is conflicting, you can use the `package::function` trick. VERY HANDY!! Search `::` in the help.
- [ ] when showing the example of a help menu, encourage students to copy-paste the `Examples` section to see how it works in action. See `seq` for example
seq(0, 1, length.out = 11) seq(stats::rnorm(20)) # effectively 'along' seq(1, 9, by = 2) # matches 'end' seq(1, 9, by = pi) # stays below 'end' seq(1, 6, by = 3) seq(1.575, 5.125, by = 0.05) seq(17) # same as 1:17, or even better seq_len(17)
- [x] in "Getting help on the Web", send the students not to stack exchange, but to [stack overflow](https://stackoverflow.com) for coding related question or [Cross validated](http://stats.stackexchange.com) for statistical questions.
- [ ] challenge 14 is really boring...
- [ ] "Some useful books on R" please cite properly and add links to the books:
e.g. https://www.springer.com/gp/book/9780387790534
"Crawley, M. - The R Book." is more Crawley, M. J. 2013. The R book. Pages 1–1051. Second edition. John Wiley & Sons, Ltd, United Kingdom.
- [ ] explain why the websites in "Some useful websites" are interesting.
### Final verification
- [x] Use the R script the wiki and the presentation side-by-side to see what needs to be harmonized
- [x] Make sure that the sections in the presentation, wiki and R script match (each header should be easy to find in each of the different tools)
General comments
Generally, many slides of this workshop refer to features in RStudio and R, without pictures or snapshots of the window. This would require the instructor to constantly shift between the presentation and RStudio to demonstrate the topics being discussed - which can be confusing, especially in a remote setting. Special attention should be given to fix these issues, either by including more snapshots (static or even GIFs), by using a different set of tools, and/or reframing and updating the content of the slides.
There are a few parts that lack explanation: e.g., the slide about error messages does not mention that error messages can be informative; the slide about objects tells us to avoid using
=
when assigning objects, but it does not say why;Examples should be more demonstrative: e.g. it is more helpful for participants to use code chunks and output to show what happens when you write objects that begin with letters or special characters or what happens when you call objects without paying attention to cases, rather than giving them a list of rules (slide 32);
Specific