MarineOmics / marineomics.github.io

Website for MarineOmics RCN-ECS. Hosts pages for panelist series and recommended practices for non-model genomics.
Creative Commons Attribution 4.0 International
4 stars 7 forks source link

Add "Best Practices" page to website #16

Open ksil91 opened 1 year ago

ksil91 commented 1 year ago

Add a separate page to the website with the Best Practices box from the manuscript (copied below). Might be good to have it in the top nav bar?

BOX 1: Best Principles in Genomics Research

Rigor

Understand the characteristics of your chosen sequencing approach. Take these characteristics into account when designing a study and during data analysis.

Goals of study should be chosen before choosing the best sequencing approach, which will inform the total number of samples and coverage needed: e.g., PoolSeq requires larger sample sizes and deeper coverage given the lack of individual genotyping (Guirao-Rico and González 2021).

Plot your data early and often. Get to know it in both its raw and processed forms.

Deepen the interpretation of results and flag sources of error throughout a workflow by plotting data such as (i) read-quality metrics pre- and post-filtering, (ii) sequence coverage across a reference and across samples, (iii) principal component analysis of replicates pre- and post-filtering, and (iv) results and predictions of statistical tests.

All models and pipelines introduce some type and magnitude of error. Compare models’ nuances to find the best approach, given your data.

This issue is particularly acute in non-model species. Some quantitative approaches towards evaluating methods include (i) comparing the performance of different methods or parameter choices using simulated data (Lotterhos, Fitzpatrick, and Blackmon 2022), (ii) measuring their predictive strengths using model selection statistics (Hooten and Hobbs 2015; Johnson and Omland 2004), and (iii) observed-predicted plots from model outputs. A basic understanding of the sensitivity of inference in different analyses will be helpful for determining how robust the results are to nuanced decisions, especially for non-model organisms or unique experimental designs.

Reproducibility

Wherever your sequencing data go, their associated metadata goes with them.

Any and all metadata that can be reported should accompany sequence data in databases such as NCBI or SRA. Data on Dryad or GitHub should crosslink to NCBI/SRA.

Take detailed records on all analysis decisions you make, including for preliminary analyses and errors that occurred, so you remember what you did and can reproduce your own work.

Use text-annotated code notebooks for bioinformatic analyses (e.g., Rmarkdown, Jupyter).

Provide a reproducible text-annotated code notebook for all final analyses so that these methods could be reproduced by someone else.

Provide these notebooks (in Rmarkdown or Jupyter) in a publically accessible format on services such as GitHub, GitLab, Dryad, Figshare, or Zenodo.