carpentries-incubator / high-dimensional-stats-r

High-dimensional statistics with R
https://carpentries-incubator.github.io/high-dimensional-stats-r
Other
12 stars 18 forks source link

Feedback from September 2022 delivery #88

Closed ewallace closed 11 months ago

ewallace commented 1 year ago

DRAFT TO BE UPDATED AFTER DAY 4 - saved here to get started, currently updated to day 3.

EdCarp delivery 2022-09-27 to 2022-09-30, with instructors @hannesbecher, @luciewoellenstein44, @ewallace. https://edcarp.github.io/2022-09-27_ed-dash_high-dim-stats/

Collaborative document: https://pad.carpentries.org/2022-09-27_ed-dash_high-dim-stats

Overall went very well, good material, happy and engaged students.

Day 1 - Introduction, Regression with many features

Learner feedback

Please list 1 thing that you liked or found particularly useful

Please list another thing that you found less useful, or that could be improved

Instructor feedback

Day 2 - Regularised regression

Learner feedback

Please list 1 thing that you liked or found particularly useful

Please list another thing that you found less useful, or that could be improved

Instructor feedback

Learners had several questions about extra arguments in calls to lm(), glmnet(), and so on. See etherpad day 2. Those should give clues to places to simplify:

Day 3 - Principal component analyses, Factor analysis

Learner feedback

Please list 1 thing that you liked or found particularly useful

Please list another thing that you found less useful, or that could be improved

Instructor feedback

PCA (Episode 4)

Factor analysis (Episode 5)

Day 4 - K-means clustering, Hierarchical clustering

Learner feedback

Instructor feedback

alanocallaghan commented 1 year ago

I don't know if these are rhetorical, but

Why as.data.frame? Comparing simplerfit_horvath <- lm(train_age ~ train_mat) to the example fit_horvath <- lm(train_age ~ ., data = as.data.frame(train_mat))

The second example preserves the variable names as is, so when you use predict with newdata it doesn't throw a warning. Should probably work with a dataframe from the start there

What does the -1 do to the methyl_mat matrix in k-fold cross validation? (in lasso <- cv.glmnet(methyl_mat[, -1], age, alpha = 1)

I'm not 100% but presumably this is removing the intercept as glmnet automatically adds one. Again probably would be better to set the data up so the code is similar across lm and glmnet calls, although I think that's actually rather difficult

ewallace commented 1 year ago

@Alanocallaghan thanks, it wasn't rhetorical and sorry for being unclear. I agree that it would be helpful to either set up the code to be more similar, or to explain the details.

alanocallaghan commented 1 year ago

The first is mentioned in this issue for a fuller explanation https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/52

hannesbecher commented 11 months ago

Many of these are now implemented now. Others have become obsolete due to restructuring.