KoesGroup / Snakemake_ChIPseq_PE

Pipeline for the analysis of PE ChIP-seq data
Creative Commons Attribution Share Alike 4.0 International
14 stars 4 forks source link

Added singularity image to the pipeline #19

Closed mgalland closed 5 years ago

mgalland commented 6 years ago

Hey guys,

To improve the reproducibility of the Snakemake_ChIP pipeline, I've added a singularity feature. It now pulls an image from the singularity hub that is a repository of scientific linux containers.

From the documentation of singularity hub:

What is a Linux Container? A container image is an encapsulated, portable environment that is created to distribute a scientific analysis or a general function. Containers help with reproducibility of such content as they nicely package software and data dependencies, along with libraries that are needed. Thus, the core of Singularity Hub are these Singularity container images, and by way of being on Singularity Hub they can be easily built, updated, referenced with a url for a publication, and shared. This small guide will help you to get started building your containers using Singularity Hub and your Github repositories.

You can run it with snakemake --use-singularity --use-conda. It will use the container image and install all softwares and libraries using conda.

If you test it, use the genseq-cn01 computer node.

JihedC commented 6 years ago

Thanks Marc! Does it only work with the cn01 of the genseq ?

mgalland commented 6 years ago

Yes because of some weird file administration issue. Would be nice to also try it on the LISA cluster.

JihedC commented 6 years ago

I tried to run it twice this afternoon and for some weird reason, the pipeline blocks at the moment it has to produce the fastqc files. It looks it's running but nothing happen after ~30 min while it should be super fast with the tiny subset samples.

JihedC commented 6 years ago

I've found this error in the log of the sample that blocks in the fastqc rule

Approx 10% complete for ChIP6_forward.fastq.gz
Approx 20% complete for ChIP6_forward.fastq.gz
Approx 30% complete for ChIP6_forward.fastq.gz
Approx 40% complete for ChIP6_forward.fastq.gz
Approx 50% complete for ChIP6_forward.fastq.gz
Approx 60% complete for ChIP6_forward.fastq.gz
Approx 70% complete for ChIP6_forward.fastq.gz
Approx 80% complete for ChIP6_forward.fastq.gz
Approx 90% complete for ChIP6_forward.fastq.gz
Exception in thread "Thread-1" java.lang.UnsatisfiedLinkError: /home/jchouaref/test/Snakemake_ChIPseq/.snakemake/conda/6830defe/jre/lib/amd64/libfontmanager.so: libfreetype.so.6: cannot open shared object file: No such file or directory
        at java.lang.ClassLoader$NativeLibrary.load(Native Method)
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1845)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at sun.font.FontManagerNativeLibrary$1.run(FontManagerNativeLibrary.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.font.FontManagerNativeLibrary.<clinit>(FontManagerNativeLibrary.java:32)
        at sun.font.SunFontManager$1.run(SunFontManager.java:361)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.font.SunFontManager.<clinit>(SunFontManager.java:357)
        at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:264)
        at sun.java2d.SunGraphics2D.getFontMetrics(SunGraphics2D.java:856)
        at uk.ac.babraham.FastQC.Graphs.QualityBoxPlot.paint(QualityBoxPlot.java:88)
        at javax.swing.JComponent.print(JComponent.java:1203)
        at uk.ac.babraham.FastQC.Modules.AbstractQCModule.writeDefaultImage(AbstractQCModule.java:68)
        at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.makeReport(PerBaseQualityScores.java:199)
        at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:131)
        at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
mgalland commented 6 years ago

Oh shit. Can you ask Wim? Maybe he has more insights. My guess is that a library is missing. Maybe the singularity image does not contain it. In that case, we would need to complement the miniconda3 Pasteur image with these missing libraries.

JihedC commented 6 years ago

This is the (fast) reply, I've got from Wim :

Hi Jihed,

libfreetype.so.6 is in the standard lib location /usr/lib64 on all nodes so that's not the problem. I guess somewhere in the stack of java, conda, docker and singularity it loses track of the default location. Or it somehow doesn't like the offered library. Debugging in a fat stack with some components I hardly know would probably take me days so i'm not considering it.

Also: Trying to improve reproducibility (and portability?) by introducing a fat extra layer of software with its own system requirements, versions, needed knowledge investment, maintenance and bugs. Are you sure?

I wish i had a better answer but my bet is it will only make thing worse.

Wim

mgalland commented 6 years ago

One solution could be to use another singularity image. And check if it works. Will look for another miniconda3 solution. To answer Wim's concern, a container such as a singularity image is supposed to be self-sufficient so by definition it does not need an extra layer of software. It is the software.

JihedC commented 5 years ago

Hi Marc,

I have finally got time to try running the pipeline with singularity on the genseq-cn01 and I have got this error message:

[jchouaref@genseq-cn01 Snakemake_ChIPseq]$ snakemake --use-singularity --use-conda
Building DAG of jobs...
Creating conda environment envs/trimmomatic.yaml...
Downloading remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/jchouaref/test/Snakemake_ChIPseq/envs/trimmomatic.yaml:
ERROR  : Image path truatpasteurdotfr/singularity-docker-miniconda doesn't exist: No such file or directory
ABORT  : Retval = 255

The dry and the --use-conda run work fine.

JihedC commented 5 years ago

I have tried to copy the singularity command from Johannes pipeline, which is the following:

singularity: "docker://continuumio/miniconda3"

And I have got this error message :

[jchouaref@genseq-cn03 Snakemake_ChIPseq]$ snakemake --use-singularity --use-conda
Building DAG of jobs...
Pulling singularity image docker://continuumio/miniconda3.
WorkflowError:
Failed to pull singularity image from docker://continuumio/miniconda3:
WARNING: pull for Docker Hub is not guaranteed to produce the
WARNING: same image on repeated pull. Use Singularity Registry
WARNING: (shub://) to pull exactly equivalent images.
ERROR: You must install squashfs-tools to build images
ABORT: Aborting with RETVAL=255
ERROR: pulling container failed!
mgalland commented 5 years ago

Hey Jihed. Don't change it to Docker singularity: "docker://continuumio/miniconda3" since Docker and Singularity are two different ways of managing software containers. Docker needs full user privileges (root) while Singularity does not.

mgalland commented 5 years ago

So ok for merging? Even though there are no more FASTQC reports? I would rather choose the FASTQC reports than singularity for now. Problems with FASTQC are apparently common due to Java issues: https://www.biostars.org/p/204261/