bioforensics / yeat

YEAT: Your Everyday Assembly Tool
Other
1 stars 0 forks source link

Updating config file #64

Closed danejo3 closed 6 months ago

danejo3 commented 7 months ago

The purpose of this PR is to update the config file as discussed in #56 to enable hybrid assembly and grid support #61.

In this PR, users will need to follow a strict config file format that they must provide.

Large improvements were made to enable hybrid and grid support.

Example of the new config file:

{
    "samples": {
        "sample1": {
            "paired": [
                [
                    "yeat/tests/data/short_reads_1.fastq.gz",
                    "yeat/tests/data/short_reads_2.fastq.gz"
                ]
            ]
        },
        "sample2": {
            "paired": [
                [
                    "yeat/tests/data/Animal_289_R1.fq.gz",
                    "yeat/tests/data/Animal_289_R2.fq.gz"
                ]
            ]
        },
        "sample3": {
            "pacbio-hifi": [
                "yeat/tests/data/ecoli.fastq.gz"
            ]
        },
        "sample4": {
            "nano-hq": [
                "yeat/tests/data/ecolk12mg1655_R10_3_guppy_345_HAC.fastq.gz"
            ]
        }
    },
    "assemblies": {
        "spades-default": {
            "algorithm": "spades",
            "extra_args": "",
            "samples": [
                "sample1",
                "sample2"
            ],
            "mode": "paired"
        },
        "hicanu": {
            "algorithm": "canu",
            "extra_args": "genomeSize=4.8m",
            "samples": [
                "sample3"
            ],
            "mode": "pacbio"
        },
        "flye_ONT": {
            "algorithm": "flye",
            "extra_args": "",
            "samples": [
                "sample4"
            ],
            "mode": "oxford"
        }
    }
}
danejo3 commented 7 months ago

@standage This PR is ready for review!

I'm honestly not sure where you should start. I've made a lot of changes; however, most of the changes were reorganizing and throwing out unnecessary code.

Main things:

Once this PR is merged, integration for grid support and hybrid will be pretty straight forward.

One of the biggest changes is instead of calling multiple snakemake jobs, we have consolidated it down to 1.

Example, Previously, We would have a paired run, then single run, then pacbio run, etc.., then bandage. Before a workflow could run, the current one has to finish first.

Now, in one go, config -- to --> various final outputs created by YEAT