bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

just-yeat-it: new YEAT CLI command #74

Closed danejo3 closed 3 months ago

danejo3 commented 3 months ago

The purpose of this PR is to introduce a new CLI called just-yeat-it that executes a simple workflow of yeat with paired-end reads. This command does not require a config file and only one assembly algorithm can be run. By default, the selected algorithm is SPAdes but can be changed to MEGAHIT or Unicycler. Users can customize the assembly algorithm command line flags by using the --extra-args flag.

Below is the help output for just-yeat-it.

(yeat) FDT010LM-613660:yeat dane.jo$ just-yeat-it -h
usage: just-yeat-it [-h] [-n] [-o DIR] [-t T] [-l L] [-c C] [-d D] [-g G] [-s S] [--sample-label STR] [--assembly-label STR] [--megahit | --unicycler] [--extra-args STR] reads reads

required arguments:
  reads                 paired-end reads in FASTQ format

options:
  -h, --help            show this help message and exit

workflow configuration:
  -n, --dry-run         construct workflow DAG and print a summary but do not execute
  -o DIR, --outdir DIR  output directory; default is current working directory
  -t T, --threads T     number of available T threads for sequential and parallel processing jobs; by default, T=1

fastp configuration:
  -l L, --length-required L
                        discard reads shorter than the required L length after pre-preocessing; by default, L=50

downsampling configuration:
  -c C, --coverage C    target an average depth of coverage Cx when auto-downsampling; by default, C=150
  -d D, --downsample D  randomly sample D reads from the input rather than assembling the full set; set D=0 to perform auto-downsampling to a desired level of coverage (see --coverage); set D=-1 to disable downsampling; by default,
                        D=0
  -g G, --genome-size G
                        provide known genome size in base pairs (bp); by default, G=0
  -s S, --seed S        seed for the random number generator used for downsampling; by default, the seed is chosen randomly

sample configuration:
  --sample-label STR    set the sample label; by default, "sample1"

algorithm configuration:
  --assembly-label STR  set the assembly label; by default, "assembly1"
  --megahit             use MEGAHIT assembly algorithm; by default, SPAdes
  --unicycler           use Unicycler assembly algorithm; by default, SPAdes
  --extra-args STR      add assembly algorithm flags; for example, "--meta" or "--isolate --careful" for SPAdes; by default, empty string
standage commented 3 months ago

LGTM, thanks!