carpentries-incubator / high-dimensional-stats-r

High-dimensional statistics with R
https://carpentries-incubator.github.io/high-dimensional-stats-r
Other
12 stars 18 forks source link

Changes to episodes 1 and 2 #99

Closed catavallejos closed 1 year ago

catavallejos commented 1 year ago

Changes as agreed in HackMD.

Notes:

1) I simplified the proposed overview provided at the end of episode 1. This is to avoid introducing too many context that are not yet explained. Instead, I tried to focus on concrete biological examples, rather than discussing some of the technical details (e.g. robustness of clusters).

2) Reference for the methylation data is TBC.

alanocallaghan commented 1 year ago

If you want general refs for the methylation data you'll probably find them here: https://github.com/immunomethylomics/FlowSorted.Blood.EPIC

catavallejos commented 1 year ago

Thanks for the comments @Alanocallaghan !

If you want general refs for the methylation data you'll probably find them here: https://github.com/immunomethylomics/FlowSorted.Blood.EPIC

More than a general reference, we were looking for the exact source from which the methylation data was downloaded. Any chance you have that?

alanocallaghan commented 1 year ago

All the data download scripts are here, with the same name as the original files, I don't think any are too arcane but do let me know if any should be clearer. I can add some details to the readme or in the course materials maybe? https://github.com/carpentries-incubator/high-dimensional-stats-r/tree/main/data

I found the download instructions here: https://bioconductor.org/packages/release/data/experiment/vignettes/FlowSorted.Blood.EPIC/inst/doc/FlowSorted.Blood.EPIC.html

alanocallaghan commented 1 year ago

On the topic of data, I think there's maybe some unused files. eg data/small_methylation.rds

catavallejos commented 1 year ago

Thanks @Alanocallaghan.

I edited the episode to put a link to the download code (which I updated to have a link to the download instructions).

catavallejos commented 1 year ago

I also deleted small_methylation.R and small_methylation.Rds as they are currently unused (and, if needed, we can always subset methylation as it was done here

catavallejos commented 1 year ago

PS: I will leave the other datasets for now, just in case they are useful for future course developments

catavallejos commented 1 year ago

@hannesbecher that's all my edits pushed. Unless you have further comments, please merge whenever you have the time.

catavallejos commented 1 year ago

one minor extra edit: for consistency, renamed exercises as challenges and added numbers.

@hannesbecher could you please check the PCA etc episodes to make sure it's consistent there too?

hannesbecher commented 1 year ago

one minor extra edit: for consistency, renamed exercises as challenges and added numbers.

@hannesbecher could you please check the PCA etc episodes to make sure it's consistent there too?

Going to check this!

catavallejos commented 1 year ago

Thanks @hannesbecher.

Changes to episode 3 to some soon.