Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

errors in CreateSequenceDictionary and ConvertToRefFlat only when run through snakemake #68

Closed dylkot closed 5 years ago

dylkot commented 5 years ago

Hi There,

I am encountering a few errors that occur in the generate_meta phase of the pipeline but that oddly don't seem to occur when I run the individual commands in isolation. I am running the pipeline on a Google Cloud Platform Ubuntu 16.04 Virtual Machine in a Docker container. The VM has 8 cores and 52G RAM and I am requesting 50G of the RAM in config.yaml and running the pipeline as such:

snakemake --use-conda --cores 4 --directory /home/dropSeqPipe

I get error messages in the CreateSequenceDictionary and ConvertToRefFlat steps as shown in the attached log file output.log. However, weirdly, I get no error messages when I run the same commands as so:

conda activate /home/dropSeqPipe/.snakemake/conda/8331163d
java -jar -Djava.io.tmpdir=/home/tmp /home/dropSeqPipe/.snakemake/conda/8331163d/share/picard-2.14.1-0/picard.jar CreateSequenceDictionary OUTPUT=/home/ref/MmulKitwit_8_92/genome.dict REFERENCE=/home/ref/MmulKitwit_8_92/genome.fa TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

AND

conda activate /home/dropSeqPipe/.snakemake/conda/dd296d1f/share
export _JAVA_OPTIONS=-Djava.io.tmpdir=/home/tmp && ConvertToRefFlat -m 25g ANNOTATIONS_FILE=/home/ref/MmulKitwit_8_92/curated_annotation.gtf SEQUENCE_DICTIONARY=/home/ref/MmulKitwit_8_92/genome.dict OUTPUT=/home/ref/MmulKitwit_8_92/curated_annotation.refFlat    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

Any idea what could be going on? For now I am just bypassing these errors and continuing on with the pipeline.

dylkot commented 5 years ago

fastqc also fails similarly

(/home/dropSeqPipe/.snakemake/conda/dd296d1f) root@0a904db57185:/home/dropSeqPipe# snakemake --use-conda --cores 4 --directory /home/dropSeqPipe
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       DetectBeadSubstitutionErrors
        1       MergeBamAlignment
        1       STAR_align
        1       SingleCellRnaSeqMetricsCollector
        1       TagReadWithGeneExon
        1       all
        1       bam_hist
        1       bead_errors_metrics
        1       clean_cutadapt
        2       convert_long_to_mtx
        1       cutadapt_R1
        1       extract_reads_expression
        1       extract_umi_expression
        1       fastqc_barcodes
        1       fastqc_reads
        2       merge_long
        1       multiqc_cutadapt_barcodes
        1       multiqc_fastqc_barcodes
        1       multiqc_fastqc_reads
        1       multiqc_star
        1       plot_adapter_content
        1       plot_knee_plot
        1       plot_rna_metrics
        1       plot_yield
        1       repair
        1       repair_barcodes
        1       violine_plots
        29

[Thu Dec 27 03:13:25 2018]
rule fastqc_reads:
    input: /home/data/RA0449.0_R2.fastq.gz
    output: /home/results/logs/fastqc/RA0449.0_R2_fastqc.html, /home/results/logs/fastqc/RA0449.0_R2_fastqc.zip
    jobid: 18
    wildcards: results_dir=/home/results, sample=RA0449.0

[Thu Dec 27 03:13:25 2018]
rule fastqc_barcodes:
    input: /home/data/RA0449.0_R1.fastq.gz
    output: /home/results/logs/fastqc/RA0449.0_R1_fastqc.html, /home/results/logs/fastqc/RA0449.0_R1_fastqc.zip
    jobid: 19
    wildcards: results_dir=/home/results, sample=RA0449.0

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/73b3d757
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/73b3d757
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/home/tmp
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/home/tmp
Exception in thread "Thread-1" java.lang.InternalError: java.lang.reflect.InvocationTargetException
        at java.desktop/sun.font.FontManagerFactory$1.run(FontManagerFactory.java:86)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.desktop/sun.font.FontManagerFactory.getInstance(FontManagerFactory.java:74)
        at java.desktop/sun.font.SunFontManager.getInstance(SunFontManager.java:247)
        at java.desktop/sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:265)
        at java.desktop/sun.java2d.SunGraphics2D.getFontMetrics(SunGraphics2D.java:856)
        at uk.ac.babraham.FastQC.Graphs.QualityBoxPlot.paint(QualityBoxPlot.java:88)
        at java.desktop/javax.swing.JComponent.print(JComponent.java:1220)
        at uk.ac.babraham.FastQC.Modules.AbstractQCModule.writeDefaultImage(AbstractQCModule.java:68)
        at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.makeReport(PerBaseQualityScores.java:199)
        at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:131)
        at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:178)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at java.desktop/sun.font.FontManagerFactory$1.run(FontManagerFactory.java:84)
        ... 13 more
Caused by: java.lang.NullPointerException
        at java.desktop/sun.awt.FontConfiguration.getVersion(FontConfiguration.java:1262)
        at java.desktop/sun.awt.FontConfiguration.readFontConfigFile(FontConfiguration.java:225)
        at java.desktop/sun.awt.FontConfiguration.init(FontConfiguration.java:107)
        at java.desktop/sun.awt.X11FontManager.createFontConfiguration(X11FontManager.java:719)
        at java.desktop/sun.font.SunFontManager$2.run(SunFontManager.java:367)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.desktop/sun.font.SunFontManager.<init>(SunFontManager.java:312)
        at java.desktop/sun.awt.FcFontManager.<init>(FcFontManager.java:35)
        at java.desktop/sun.awt.X11FontManager.<init>(X11FontManager.java:56)
        ... 18 more
Hoohm commented 5 years ago

Hello @dylkot I found this related issue on openidk. It seems to be related to some missing fonts. I'm not sure how to solve this.

Which version of openjdk runs on your VM? On travis-ci they have 11 and 10.

I would also recommend a small change in your configuration. I would change it to a value close to the max R2 file that you have. Asking for 50g on two concurrent jobs might cause some issues.

Hoohm commented 5 years ago

Found something related to a java docker image: https://github.com/appropriate/docker-jetty/issues/15

dylkot commented 5 years ago

Thanks for these tips. I didn't have openjdk installed in the Docker container because I assumed the conda version would suffice. I installed the default openjdk for ubuntu which is 10.02 (apt-get install default-jdk) which included fontconfig and now the issue seems to be resolved.

dylkot commented 5 years ago

Maybe adding fontconfig to the conda environment would fix the issue for going forward.

Hoohm commented 5 years ago

Did you install snakemake with conda?

On Thu, Dec 27, 2018, 17:13 Dylan Kotliar notifications@github.com wrote:

Maybe adding fontconfig to the conda environment would fix the issue for going forward.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/Hoohm/dropSeqPipe/issues/68#issuecomment-450181359, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNXaO2NpIb2QJezS7qmlbS6pgP7NVlqks5u9PGQgaJpZM4ZhQjN .

dylkot commented 5 years ago

Yep

Hoohm commented 5 years ago

dropseq_tools is already asking for openjdk<=8.0 but it seems openjdk doesn't have any dependencies.

@cgirardot Could you add fontconfig the to dropseq_tools dependencies? Or should I add it to the env for dropseq_tools?

@dylkot Could you test adding it to the envs that need it and uninstall the ones you did? I assume it's for all tools that use openjdk. Once you have tested that the addition to the envs work, I'll make a PR on https://github.com/conda-forge/openjdk-feedstock/ adding fontconfig to the meta.yaml.

dylkot commented 5 years ago

Unfortunately just adding fontconfig to the relevant conda environments:

/home/dropSeqPipe/.snakemake/conda/73b3d757 /home/dropSeqPipe/.snakemake/conda/dd296d1f /home/dropSeqPipe/.snakemake/conda/8331163d

does not seem to fix the errors. I still get the same error messages. I will explore and see if I can find other necessary dependancies that can be installed through conda.

dylkot commented 5 years ago

It seems like installing both fontconfig and font-ttf-dejavu-sans-mono to the 3 environments fixes the issue. I used these installation commands:

conda install fontconfig conda install -c anaconda font-ttf-dejavu-sans-mono

Hoohm commented 5 years ago

Ok thanks. One env is dropseq_tools, the second is fastqc, do you know which one is the third?

dylkot commented 5 years ago

the third one is plots_ext

dylkot commented 5 years ago

and one is picard. I don't seem to have one exclusively for fastqc

Hoohm commented 5 years ago

fastqc is implemented as a snakemake wrapper. But this it odd. fastqc should not run if the packages are not present in the env.

I've added font-ttf-dejavu-sans-mono=2.37 and fontconfig=2.13.1 to plots_ext, dropseq_tools and picard.