carpentries-incubator / high-dimensional-stats-r

High-dimensional statistics with R
https://carpentries-incubator.github.io/high-dimensional-stats-r
Other
12 stars 18 forks source link

Third delivery suggested changes #64

Closed ailithewing closed 1 year ago

ailithewing commented 2 years ago

A list of proposed changes following the May delivery of HDS

These are in addition to the changes in the pull request ailith_delivery3 and to the changes that Hannes made that have yet to be pushed to the main course materials.

Throughout

Intro

Regression with many features (many outcomes)

Regularisation

PCA

FA

K means

Hierarchical clusters

Other

ailithewing commented 2 years ago

@catavallejos @nathansam @hwarden162 @Alanocallaghan Please add any additional things that I've missed.

nathansam commented 2 years ago

kmeans: set seed for heatmap code chunk starting library("pheatmap") (which might be covered by the coloured blocks to do)

hannesbecher commented 2 years ago

Challenge 1 in episode 1. Not sure about question 4. Is this a good example of high-dim data? Because it is one observation and so many features?

  1. Predicting probability of a patient's cancer progressing using gene expression data from 20,000 genes, as well as data associated with general patient health (age, weight, BMI, blood pressure) and cancer growth (tumour size, localised spread, blood test results).
alanocallaghan commented 2 years ago

Changing that challenge from singular to plural patients would also be good to avoid implying high precision from generic prediction models (ie precision med hype)

hannesbecher commented 2 years ago

Current uniqueness/communality explanations contradicts Wikipedia I think: https://en.wikipedia.org/wiki/Factor_analysis#Terminology

alanocallaghan commented 2 years ago

One way of reducing the number of dep packages is to move all the data wrangling stuff to a data package and then just remotes::install_github it.

hannesbecher commented 1 year ago

Glossary still open, but covered by issue #89