bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

`yeat-auto` - automatically fill in samples for the config file #76

Closed danejo3 closed 2 months ago

danejo3 commented 3 months ago

The purpose of this PR is to automatically fill in the configuration file. When users have multiple samples, for example, 10+, it can be tedious to fill out all the paths in the sample section of the config. To ease the manual labor and to prevent potential type-os, the yeat-auto command was created. When using this command, users will need to supply either a 1) sample name, 2) list of sample names, or 3) a text document containing all the sample names separated by newlines. The second required input is either a list of fastq files or a directory containing the sample's fastq files (see the --seq-path and --files flags).

Auto-populating is done by taking a sample's name and looking for matching fastq files based on substrings. 

For example, the sample name is short_reads and the matching files are short_read_1.fastq.gz and short_read_2.fastq.gz.

Auto-populating is only available for paired-end reads. As a result, all samples are dumped into the default spades algorithm in the "assemblies" section of the config.

"assemblies": {
    "spades-default": {
        "algorithm": "spades",
        "extra_args": "",
        "samples": [
            "short_reads"
        ],
        "mode": "paired"
    }
}

Users can then run the normal YEAT command with the newly created configuration, or if there are any changes that need to be made, users can adjust the config file as needed before passing it into yeat.

danejo3 commented 2 months ago

Comments addressed! Thanks!