Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

fastqc not looking for adapters in the adapter file #75

Closed dylkot closed 5 years ago

dylkot commented 5 years ago

Currently fastqc doesn't seem to look for the adapters that are used for cutadapt. As a result it is saying that there is no adapter contamination for samples that I know have adapter contamination. Unfortunately cutadapt expects the adapters file to be in fasta format whereas fastqc wants it to be in a tab delimited format. Perhaps we add another configuration parameter for a fastqc adapter file? Or else we can convert the fasta file provided for cutadapt into a file that can be used as input for fastqc.

Hoohm commented 5 years ago

Hi @dylkot we actually had this talk with @seb-mueller earlier this week and we will add a conversion from fasta to tsv file to do exactly this.

dylkot commented 5 years ago

Great, thanks!

seb-mueller commented 5 years ago

@dylkot I've just created a feature branch that addresses this, could you try this out?: https://github.com/Hoohm/dropSeqPipe/tree/feature/fastqc_auto_adapter

@Hoohm , could you revise the changes? Ultimately, it creates fastqc_adapter.tsv in the root dir based on the adapter fasta. This is then used by FastQC instead of the default adapters. Once ok, I'll merge it into develop.

I've also put/updated a few sensible drop-seq adapters into templates/custom_adapters.fa. I think this could serve as a good collections to get started in drop-seq. Thoughts?

An example output adapter view of the test-data sample1_R2_fastqc.html: image

seb-mueller commented 5 years ago

Just merged it in develop and travis did gave an error:

ModuleNotFoundError in line 58 of /home/travis/build/Hoohm/dropSeqPipe/rules/fastqc.smk:
No module named 'Bio'

Seems like the biopython module needs to be imported. I have already tried to add an environment into fastqc.smk which contains biopython as below:

...
conda: '../envs/merge_bam.yaml'
run:
...

But this gave the following error:

RuleException in line 57 of /home/user/code/dropSeqPipe/rules/fastqc.smk:
Conda environments are only allowed with shell, script, or wrapper directives (not with run).

Do you know how to make biopython available to a run directive?

Hoohm commented 5 years ago

Sadly you can't, you have to create a script and make a conda env with the call to the script.

seb-mueller commented 5 years ago

This is now integrated in the develop branch.