Open alexcoppe opened 2 months ago
I am getting this exact same error on a CentOS 7 cluster.
Command error:
OMP: Error #15: Initializing libomp.so, but found unknown library already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
.command.sh: line 12: 41 Aborted (core dumped) python3 $CLAIRS_PATH/clairs.py predict --tensor_fn pileup_tensor_can/chr2.33_0_1 --call_fn vcf_output/p_chr2.33_0_1.vcf --chkpnt_fn ${CLAIR_MODELS_PATH}/ont_r10_dorado_sup_5khz/pileup.pkl --platform ont --use_gpu False --ctg_name chr2 --pileup
It seems to be somewhat sporadic -- about half the instances of snv:clairs_predict_pileup_indel and snv:clairs_predict_pileup_snv are running to completion.
I also cannot reproduce it. I can got into the work directory and run bash .command.run
, and it completes without issue. This includes re-running on the exact same node on our cluster where it failed the first time.
@oneillkza @alexcoppe yes, this is an error that we observe spuriously in some clusters. Most times, restart the workflow with -resume
will cause the process to succeed without displaying the same issue. We are looking into this, thanks for reporting it
@oneillkza @alexcoppe yes, this is an error that we observe spuriously in some clusters. Most times, restart the workflow with
-resume
will cause the process to succeed without displaying the same issue. We are looking into this, thanks for reporting it
@RenzoTale88, thank you for the help. I restarted the workflow with -resume
a couple of times and ended up with the same error :frowning_face:
@alexcoppe thanks for confirming that the issue persists. Did the error occurr at the same process (i.e. same chunk) or at a different chunk?
@RenzoTale88 exactly the same error than above
I have found when restarting with -resume that it seems to get through the specific job that it failed on before, but tends to fail again later on. I got through most of what I think were the jobs for this process by resubmitting about a dozen times.
I've also been trying to add a retry option to the process (currently it only retries on certain error codes but not this one), since I think that would likely solve this. But nextflow seems to be ignoring the contents of the config file I pass it. (It won't even override memory requirements for processes.) I'm still not sure what's going on there.
I was wondering if there is already a fix for this issue? This error keeps on reoccurring
@rhagelaar @oneillkza we are investigating this, will keep you updated!
Hi, just to report that I am having the exact same issue. Please keep us updated of any fix :)
@selmapichot @rhagelaar @oneillkza @alexcoppe sorry for the slow progress in this. We are still doing some investigations on what is causing this issue. In the meanwhile, could you please try the latest release of the workflow (v.1.2.1) and check if the workflow gets to completion?
Tried it but give me an error regarding the fact that the bam contains reads basecalled with more than one basecaller model. I opened a new Issue. Thank you very much for your help
Data from mixed models are not supported by ClairS, so please ensure that you have data called with a single basecall model and try again.
Operating System
Other Linux (please specify below)
Other Linux
Ubuntu
Workflow Version
v1.1.0
Workflow Execution
Command line
EPI2ME Version
No response
CLI command run
nextflow-23.10.0-all run epi2me-labs/wf-somatic-variation -profile singularity -resume -process.executor 'pbspro' -process.cpus 64 -process.memory 256.GB -latest -work-dir '/archive/s2/genomics/onco_nanopore/test' -with-timeline --snv --sv --mod --sample_name 'OHU0002HI' --bam_normal '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002HTNDN/OHU0002HTNDN.bam' --bam_tumor '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002ITTDN/OHU0002ITTDN.bam' --ref '/archive/s1/sconsRequirements/databases/reference/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta' --out_dir '/archive/s2/genomics/onco_nanopore/OHU0002HI_wf-somatic-variation_2024_04_16' --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v4.2.0' --phase_normal --classify_insert --force_strand --normal_min_coverage 0 --tumor_min_coverage 0
Workflow Execution - CLI Execution Profile
singularity
What happened?
It stopped and showed the bellow error
Relevant log output
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
other (please describe below)
Other demo data information