KoesGroup / Snakemake_ChIPseq_PE

Pipeline for the analysis of PE ChIP-seq data
Creative Commons Attribution Share Alike 4.0 International
14 stars 4 forks source link

Single end #15

Closed mgalland closed 5 years ago

mgalland commented 6 years ago

The snakemake pipeline is for now only usable for PE sequencing, it would be good to have it to work for single end as well.

JihedC commented 5 years ago

I have found this information : https://groups.google.com/forum/#!topic/Snakemake/VBs3KLN89sU https://groups.google.com/forum/#!msg/snakemake/qX7RfXDTDe4/cKZBfc_PAAAJ

JihedC commented 5 years ago

This is a suggestion I got from the snakelike bitbucket issues I have started :

I'd rather add a config file parameter that you amy call "isPairedEnd" that is either set to true or false, >then you can simply write 2 versions of the same rule via an "if isPairedEnd: rule ... else rule ..." >statement. I use this in my pipelines and it works fine and, importantly, does not require changing the >Snakefile for different datasets.

JihedC commented 5 years ago

For now the I have separated the two pipelines, the SE ChIP-seq Snakemake pipeline can be found here. Several rules had to be changed such as:

JihedC commented 5 years ago

def is_single_end(sample): """This function detect missing value in the column 2 of the units.tsv, it is used by the function get_trimmed_reads to define the samples used by the align rule""" return pd.isnull(units.loc[(sample), "fq2"])

def get_trimmed_reads(wildcards): """Get trimmed reads of a given sample """ if not is_single_end(**wildcards):

paired-end sample

    return expand(WORKING_DIR + "trimmed/{sample}.{group}.fastq.gz",
                  group=[1, 2], **wildcards)
# single end sample
return WORKING_DIR + "trimmed/{sample}.fastq.gz".format(**wildcards)