Open GavinHuttley opened 11 months ago
We don't want students competing with each other (bandwidth wise) on a wifi network to download large volumes of data. So we will need example "download" configs that allow them to download of a small amount of data. We will need already downloaded larger data sets, and already "installed" larger data sets which they can grab. (Noting here that the "installed" data sets are much smaller than the original downloads.)
We could reframe this as:
Technology items all transferred to individual issues and assigned to @khiron
Know
Technology
- computer setup pages(s), participants to do it before attending) - installing / updating cogent3 (PyPl or GitHub). (Advice from Peter M was to do it via conda, but this won't work on macos if users don't have homebrew + xcode tools.) @ pre-meeting - intro to PyPI and GitHub - intro to pip - intro to conda - explanation of virtual environments (what and why) - how to ask for help (GitHub Discussions) - Raise issues, contribute (Issues, c3dev) - backup get jupyterhub workingLO Understanding experimental design issue
What they need to consider in choosing sequences for study. Reproducible computation -- `scitrack` - different sequence types and relevance for experimental design - different sequence relationship typesLO - Getting data
- Published GenBank ID's (e.g. REFSOIL) - Published (already aligned) data set (Duchene et al example) - Ensembl downloading and InstallingLO - Sampling ensembl
- Downloading - Installing - Data summariesLO - Identifying and dealing with data issues
- inconsistent meta-data (data wrangling REFSOIL GenBank files) - demonstrate using `annotation_db` - explore using dotplots - File formats issues - Duchene phylip formats, solving using `bad_phylip` app - extremely long fasta sequence labels (e.g. making sure you can collate genomes from one species)LO - sampling sequence classes Ensembl
- sampling homologous sequences - sampling alignmentsLO - Alignments
- using cogent3 - quantifying alignment quality - visualisationLO - Sampling alignments
- selecting by length - codon positions - consistent species presenceLO - Unsolved / Important problems
- alignment quality scores! - pair and multiple