dahak-metagenomics / dahak

benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.
https://dahak-metagenomics.github.io/dahak
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

Verify that adapter sequences are correct for podar dataset #10

Open brooksph opened 7 years ago

brooksph commented 7 years ago

Truseq?

brooksph commented 6 years ago

We should modify the .settings file downloading adapter sequences step to allow for the use of user specified adapter sequences. This can be done by create a config yaml or json (e.g. https://github.com/common-workflow-language/workflows/blob/master/tools/trimmomatic-illumina_clipping.yaml) file in the same manner we will use to allow the user to specify sequences.

charlesreid1 commented 6 years ago

Yes, that's very much part of the plan. The file you linked to defines a default parameters dictionary for the read filtering workflow. That means each of the parameters in that dictionary has the default value seen in that file, but each can be overridden by the user, so they can add this to their workflow parameters .json to use a custom adapter at an arbitrary URL:

(this goes in the Snakefile)

config = {
    ...
    'read_filtering' : {
        ...
        'adapter_file' : {
            'MyAdapter.fa' : 'http://example.com/some/fancy/new/adapter/file.fa'
        }
    }
}