bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

Customize downsample, depth of coverage, and genome size per sample #77

Closed danejo3 closed 2 months ago

danejo3 commented 2 months ago

The purpose of this MR is to resolve #75.

Previously, YEAT would adjust downsample configuration values based on the CLI inputs and applied to all samples noted in the config file. This can be problematic if two or more samples are not the same. In this MR, we added specific entries for each sample to take in downsample, genome_size, and coverage_depth for paired and single-end reads. Just like the CLI flags, users will add the same values to the config file for each sample.

Because just-yeat-it does not use a config file, the original downsample configuration flags are still available.

The only flag that still remains for yeat is --seed.

Below is an example of a config file with the following updates:

{
    "samples": {
        "sample1": {
            "paired": [
                [
                    "yeat/tests/data/short_reads_1.fastq.gz",
                    "yeat/tests/data/short_reads_2.fastq.gz"
                ]
            ],
            "downsample": 0,
            "genome_size": 0,
            "coverage_depth": 150
        },
        "sample2": {
            "paired": [
                [
                    "yeat/tests/data/Animal_289_R1.fq.gz",
                    "yeat/tests/data/Animal_289_R2.fq.gz"
                ]
            ],
            "downsample": 0,
            "genome_size": 0,
            "coverage_depth": 150
        },
        "sample3": {
            "pacbio-hifi": [
                "yeat/tests/data/ecoli.fastq.gz"
            ],
            "downsample": 0,
            "genome_size": 0,
            "coverage_depth": 150
        },
        "sample4": {
            "nano-hq": [
                "yeat/tests/data/ecolk12mg1655_R10_3_guppy_345_HAC.fastq.gz"
            ],
            "downsample": 0,
            "genome_size": 0,
            "coverage_depth": 150
        }
    },
    "assemblies": {
        "spades-default": {
            "algorithm": "spades",
            "extra_args": "",
            "samples": [
                "sample1",
                "sample2"
            ],
            "mode": "paired"
        },
        "hicanu": {
            "algorithm": "canu",
            "extra_args": "genomeSize=4.8m",
            "samples": [
                "sample3"
            ],
            "mode": "pacbio"
        },
        "flye_ONT": {
            "algorithm": "flye",
            "extra_args": "",
            "samples": [
                "sample4"
            ],
            "mode": "oxford"
        }
    }
}