Arcadia-Science / 2022-prjna853785-sourmash

snakemake pipeline to analyze assemblies from a subset of run accessions in prjna853785 cheese production samples
1 stars 0 forks source link

Start project with readme, environment, and itial snakemake workflow to execute sourmash commands #1

Closed taylorreiter closed 2 years ago

taylorreiter commented 2 years ago

This PR is the first PR for this repository. It does two things:

  1. Starts a README describing the goals of the repository, provides background information on the code found in this repo, provides instructions on how to execute the repo, and leaves place holders for future information (e.g. visualization and notebooks).
  2. Includes a snakefile and other associated files to make a snakemake workflow for sourmash commands (see readme for motivations) a. Snakefile: snakemake workflow that coordinates the execution of sourmash commands on metagenome assemblies. I have run this workflow and can confirm it runs correctly :) Eventually, I will add notebooks that will visualize the output of the workflow, but I wanted to have this portion reviewed before dumping a bunch more code. b. environment.yml: specifies the run environment for the workflow. See README.md for more information. c. envs/*yml: environments created and managed by the snakefile (see the conda: directive in each rule to know which environment is used by each step of the workflow. d. scripts/: folder for auxiliary scripts executed by the snakemake workflow. In this case, it only includes sig_to_csv.py, a python script to convert a sourmash sketch into a csv file. e. inputs/metadata.csv: metadata file encoding sample names. Used by the snakefile to determine file prefixes.
taylorreiter commented 2 years ago

and maybe of interest to @mertcelebi...this is a demo of how I'm expecting code pushes and PRs to look for data analysis projects.

taylorreiter commented 2 years ago

Awesome, thanks @elizabethmcd!