Failure when generating HTML report

tdido commented 4 years ago

My run seems to fail at some point. Last thing the log says is that it's generating the HTML report.

Command line:

 nanovar -t 150 -f hg38 out/x1/subsample/COLO829/ERR2752452.fastq.gz res/genome/GRCh38.primary_assembly.genome.fa out/x1/nanovar/COLO829

I get this through stderr:

[E::idx_find_and_load] Could not retrieve index file for 'out/x1/nanovar/COLO829/ERR2752452-GRCh38.primary_assembly.genome-mm.bam'
OMP: Error #15: Initializing libomp.so, but found libiomp5.so already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure
 that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/

Output directory looks like this:

├── ERR2752452-GRCh38.primary_assembly.genome-mm.bam
├── ERR2752452.nanovar.total.vcf
├── fig
│   └── depth_of_coverage.png
├── genome.sizes
├── GRCh38.primary_assembly.genome.counts
├── GRCh38.primary_assembly.genome.counts.obinary
├── NanoVar-250620-1250.log
└── sv_support_reads.tsv

Log file here

cytham commented 4 years ago

Hi, can I check if the output directory contains the file "ERR2752452.nanovar.pass.vcf"?

What is your OS?

tdido commented 4 years ago

Nope, that file does not exist.

Since the FASTQ file I used had only 0.7x coverage, I did another run, this time with the full FASTQ, no downsampling (60x coverage). In this case the *pass.vcf file exists, but I get the same error.

Here is the log for the run with the complete FASTQ

And these are the contents of the directory:

.
├── ERR2752452-GRCh38.primary_assembly.genome-mm.bam
├── ERR2752452.nanovar.pass.vcf
├── ERR2752452.nanovar.total.vcf
├── fig
│   └── depth_of_coverage.png
├── genome.sizes
├── GRCh38.primary_assembly.genome.counts
├── GRCh38.primary_assembly.genome.counts.obinary
├── NanoVar-230620-1650.log
└── sv_support_reads.tsv

tdido commented 4 years ago

I'm using a conda environment on Debian.

Here's the environment definition.

cytham commented 4 years ago

Thanks, that helps.

It seems like the 0.7X coverage run resulted in all SVs failing the threshold score of 1. Causing no passed SVs and hence no pass file. But that is not the problem here.

For the OMP error, you may want to try conda install nomlk in your conda environment as suggested here https://github.com/dmlc/xgboost/issues/1715

Can you also try to launch the python console and try importing some packages:

import os
import math
import datetime
import numpy as np
import nanovar
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter
from distutils.dir_util import copy_tree

tdido commented 4 years ago

Thank you for your help.

Adding nomkl didn't work. I'll try the other solutions proposed there.

How about the missing index for the bam file? Any ideas?

cytham commented 4 years ago

Thanks for trying it out.

I am still investigating the index error. The report generation should not require a bam index. Does the index error pops up right at the end when the run halts?

Were you able to import the packages successfully?

tdido commented 4 years ago

Well, I could fix the OMP error by running this before NanoVar:

export KMP_DUPLICATE_LIB_OK='True'

With that all files are generated correctly, including the HTML report, and NanoVar exits gracefully, so my pipeline doesn't exit with an error anymore.

The index error remains, though, even if only visible in the log file. I can confirm that it happens much earlier than the end of the processing, though.

Thank you for your help.

cytham commented 4 years ago

I think the index error sparks from the latest version of pysam (v0.16.0.1) which you are using. I think the error appears when the BAM file is read by pysam e.g. alignment = pysam.AlignmentFile(bam_path, "rb").

Can you try to downgrade the pysam version to v0.15.3 by conda install -c bioconda pysam==0.15.3 and see if you still get the error? You can test it quickly by running NanoVar with the BAM file of the 0.7X cov dataset as input after the pysam downgrade.

tdido commented 4 years ago

Yes, that did the trick. All errors are gone now.

Here's the adjusted minimal conda env I used, in case anyone ends up here and finds it useful.

Thank you again for your help!

cytham / nanovar

Failure when generating HTML report #11