broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

Convergence Error running GATK GermlineCNVCaller cohort mode #8952

Open snakesch opened 3 months ago

snakesch commented 3 months ago

Dear GATK developers,

Thank you very much for the tool! I recently encountered an issue while running GATK gCNV, specifically with the step GermlineCNVCaller in cohort mode. I ran the workflow using WDL and the GATK v4.3.0.0 docker image. I am not sure how the error should be interpreted and thus would like to seek your advice on debugging. Since the entire cohort contains 75 samples, I omitted some verbose debug messages in the error log below.

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/paedyl01/disk1/louisshe/work/NGS/wdl/test_workflow_cnv/germline/cromwell-executions/CNVGermlineCohortWorkflow/d53c0a12-6b07-4f74-acb
21:05:38.275 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
21:05:38.297 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /paedyl01/disk1/louisshe/tmp/gatk/libgkl_compression8230524266824482022.so
21:05:38.388 INFO  GermlineCNVCaller - ------------------------------------------------------------
21:05:38.388 INFO  GermlineCNVCaller - The Genome Analysis Toolkit (GATK) v4.3.0.0
21:05:38.388 INFO  GermlineCNVCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
21:05:38.388 INFO  GermlineCNVCaller - Executing as louisshe@paedyl02 on Linux v3.10.0-1160.11.1.el7.x86_64 amd64
21:05:38.389 INFO  GermlineCNVCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
21:05:38.389 INFO  GermlineCNVCaller - Start Date/Time: August 13, 2024 9:05:38 PM GMT
21:05:38.389 INFO  GermlineCNVCaller - ------------------------------------------------------------
21:05:38.389 INFO  GermlineCNVCaller - ------------------------------------------------------------
21:05:38.390 INFO  GermlineCNVCaller - HTSJDK Version: 3.0.1
21:05:38.390 INFO  GermlineCNVCaller - Picard Version: 2.27.5
21:05:38.390 INFO  GermlineCNVCaller - Built for Spark Version: 2.4.5
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.BUFFER_SIZE : 131072
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.CREATE_INDEX : false
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.CREATE_MD5 : false
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.CUSTOM_READER_FACTORY :
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.REFERENCE_FASTA : null
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:05:38.391 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
21:05:38.392 DEBUG ConfigFactory - Configuration file values:
21:05:38.395 DEBUG ConfigFactory -      gcsMaxRetries = 20
21:05:38.395 DEBUG ConfigFactory -      gcsProjectForRequesterPays =
21:05:38.395 DEBUG ConfigFactory -      gatk_stacktrace_on_user_exception = false
21:05:38.395 DEBUG ConfigFactory -      samjdk.use_async_io_read_samtools = false
21:05:38.395 DEBUG ConfigFactory -      samjdk.use_async_io_write_samtools = true
21:05:38.395 DEBUG ConfigFactory -      samjdk.use_async_io_write_tribble = false
21:05:38.395 DEBUG ConfigFactory -      samjdk.compression_level = 2
21:05:38.395 DEBUG ConfigFactory -      spark.kryoserializer.buffer.max = 512m
21:05:38.395 DEBUG ConfigFactory -      spark.driver.maxResultSize = 0
21:05:38.395 DEBUG ConfigFactory -      spark.driver.userClassPathFirst = true
21:05:38.395 DEBUG ConfigFactory -      spark.io.compression.codec = lzf
21:05:38.395 DEBUG ConfigFactory -      spark.executor.memoryOverhead = 600
21:05:38.395 DEBUG ConfigFactory -      spark.driver.extraJavaOptions =
21:05:38.395 DEBUG ConfigFactory -      spark.executor.extraJavaOptions =
21:05:38.395 DEBUG ConfigFactory -      codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
21:05:38.395 DEBUG ConfigFactory -      read_filter_packages = [org.broadinstitute.hellbender.engine.filters]
21:05:38.395 DEBUG ConfigFactory -      annotation_packages = [org.broadinstitute.hellbender.tools.walkers.annotator]
21:05:38.395 DEBUG ConfigFactory -      cloudPrefetchBuffer = 40
21:05:38.395 DEBUG ConfigFactory -      cloudIndexPrefetchBuffer = -1
21:05:38.395 DEBUG ConfigFactory -      createOutputBamIndex = true
21:05:38.396 INFO  GermlineCNVCaller - Deflater: IntelDeflater
21:05:38.396 INFO  GermlineCNVCaller - Inflater: IntelInflater
21:05:38.396 INFO  GermlineCNVCaller - GCS max retries/reopens: 20
21:05:38.396 INFO  GermlineCNVCaller - Requester pays: disabled
21:05:38.396 INFO  GermlineCNVCaller - Initializing engine
21:05:38.399 DEBUG ScriptExecutor - Executing:
21:05:38.399 DEBUG ScriptExecutor -   python
21:05:38.399 DEBUG ScriptExecutor -   -c
21:05:38.399 DEBUG ScriptExecutor -   import gcnvkernel
21:06:10.792 DEBUG ScriptExecutor - Result: 0
21:06:10.792 INFO  GermlineCNVCaller - Done initializing engine
21:06:10.826 INFO  GermlineCNVCaller - Intervals specified...
log4j:WARN No appenders could be found for logger (org.broadinstitute.hdf5.HDF5Library).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
21:06:12.479 INFO  FeatureManager - Using codec IntervalListCodec to read file file:///paedyl01/disk1/louisshe/work/NGS/wdl/test_workflow_cnv/germline/cromwell-executions/CNVGuts/-947966988/Homo_sapiens_assembly38.bed.preprocessed.filtered.scattered.0154.interval_list
21:06:12.640 DEBUG FeatureDataSource - Cache statistics for FeatureInput /paedyl01/disk1/louisshe/work/NGS/wdl/test_workflow_cnv/germline/cromwell-executions/CNVGermlineCohort8/Homo_sapiens_assembly38.bed.preprocessed.filtered.scattered.0154.interval_list:/paedyl01/disk1/louisshe/work/NGS/wdl/test_workflow_cnv/germline/cromwell-executions/CNVGermli947966988/Homo_sapiens_assembly38.bed.preprocessed.filtered.scattered.0154.interval_list:
21:06:12.640 DEBUG FeatureCache - Cache hit rate  was 0.00% (0 out of 0 total queries)
21:06:12.645 INFO  IntervalArgumentCollection - Processing 4999155 bp from intervals
21:06:12.656 INFO  GermlineCNVCaller - Reading and validating annotated intervals...
21:06:18.914 WARN  GermlineCNVCaller - Sequence dictionary in annotated-intervals file does not match the master sequence dictionary.
21:06:19.130 INFO  GermlineCNVCaller - GC-content annotations for intervals found; explicit GC-bias correction will be performed...
21:06:19.200 INFO  GermlineCNVCaller - Running the tool in COHORT mode...
21:06:19.200 INFO  GermlineCNVCaller - Validating and aggregating data from input read-count files...
21:07:11.897 DEBUG ScriptExecutor - Executing:
21:07:11.897 DEBUG ScriptExecutor -   python
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/cohort_denoising_calling.418897092082188314.py
21:07:11.897 DEBUG ScriptExecutor -   --ploidy_calls_path=/paedyl01/disk1/louisshe/work/NGS/wdl/test_workflow_cnv/germline/cromwell-executions/CNVGermlineCohortWorkflow/d53c0a
21:07:11.897 DEBUG ScriptExecutor -   --output_calls_path=/paedyl01/disk1/louisshe/out/NMD/batch1_2023/batch1_all/cnv/cohort_calls/batch1_all-calls
21:07:11.897 DEBUG ScriptExecutor -   --output_tracking_path=/paedyl01/disk1/louisshe/out/NMD/batch1_2023/batch1_all/cnv/cohort_calls/batch1_all-tracking
21:07:11.897 DEBUG ScriptExecutor -   --random_seed=1984
21:07:11.897 DEBUG ScriptExecutor -   --modeling_interval_list=/paedyl01/disk1/louisshe/tmp/gatk/intervals6744296186531223263.tsv
21:07:11.897 DEBUG ScriptExecutor -   --output_model_path=/paedyl01/disk1/louisshe/out/NMD/batch1_2023/batch1_all/cnv/cohort_calls/batch1_all-model
21:07:11.897 DEBUG ScriptExecutor -   --enable_explicit_gc_bias_modeling=True
21:07:11.897 DEBUG ScriptExecutor -   --read_count_tsv_files
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A140019A.rc7689938763423585433.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A140227A.rc5534074413391544082.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A140228A.rc8415746856369873408.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A150001A.rc6707204643753825321.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A160688B.rc867291074142408537.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A180282.rc2654253522563750156.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A180285.rc7225441130334195352.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A180287.rc9197457766389224118.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A190117A.rc3193929221414883945.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A190118A.rc710110938611804899.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A190119A.rc5863858387543609595.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A200126.rc224011369846433499.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A200137.rc158441496946272590.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A200256.rc4161183391298828848.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/A200301.rc7021293169640503435.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/NM145A.rc1138236372314543774.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/NM146A.rc4638504608748835640.tsv
21:07:11.897 DEBUG ScriptExecutor -   /paedyl01/disk1/louisshe/tmp/gatk/NM156.rc4899167012623615948.tsv
21:07:11.897 DEBUG ScriptExecutor -   --psi_s_scale=1.000000e-02
21:07:11.897 DEBUG ScriptExecutor -   --mapping_error_rate=1.000000e-02
21:07:11.897 DEBUG ScriptExecutor -   --depth_correction_tau=1.000000e+04
21:07:11.897 DEBUG ScriptExecutor -   --q_c_expectation_mode=hybrid
21:07:11.897 DEBUG ScriptExecutor -   --num_samples_copy_ratio_approx=200
21:07:11.897 DEBUG ScriptExecutor -   --max_bias_factors=6
21:07:11.897 DEBUG ScriptExecutor -   --psi_t_scale=1.000000e-02
21:07:11.897 DEBUG ScriptExecutor -   --log_mean_bias_std=1.000000e-01
21:07:11.897 DEBUG ScriptExecutor -   --init_ard_rel_unexplained_variance=1.000000e-01
21:07:11.897 DEBUG ScriptExecutor -   --num_gc_bins=20
21:07:11.897 DEBUG ScriptExecutor -   --gc_curve_sd=1.000000e+00
21:07:11.897 DEBUG ScriptExecutor -   --active_class_padding_hybrid_mode=50000
21:07:11.897 DEBUG ScriptExecutor -   --enable_bias_factors=True
21:07:11.897 DEBUG ScriptExecutor -   --disable_bias_factors_in_active_class=False
21:07:11.897 DEBUG ScriptExecutor -   --p_alt=5.000000e-04
21:07:11.897 DEBUG ScriptExecutor -   --cnv_coherence_length=1.000000e+04
21:07:11.897 DEBUG ScriptExecutor -   --max_copy_number=5
21:07:11.897 DEBUG ScriptExecutor -   --p_active=0.100000
21:07:11.897 DEBUG ScriptExecutor -   --class_coherence_length=10000.000000
21:07:11.897 DEBUG ScriptExecutor -   --learning_rate=5.000000e-02
21:07:11.897 DEBUG ScriptExecutor -   --adamax_beta1=9.000000e-01
21:07:11.897 DEBUG ScriptExecutor -   --adamax_beta2=9.900000e-01
21:07:11.897 DEBUG ScriptExecutor -   --log_emission_samples_per_round=50
21:07:11.897 DEBUG ScriptExecutor -   --log_emission_sampling_rounds=10
21:07:11.897 DEBUG ScriptExecutor -   --log_emission_sampling_median_rel_error=5.000000e-03
21:07:11.897 DEBUG ScriptExecutor -   --max_advi_iter_first_epoch=5000
21:07:11.897 DEBUG ScriptExecutor -   --max_advi_iter_subsequent_epochs=100
21:07:11.897 DEBUG ScriptExecutor -   --min_training_epochs=10
21:07:11.897 DEBUG ScriptExecutor -   --max_training_epochs=100
21:07:11.897 DEBUG ScriptExecutor -   --initial_temperature=2.000000e+00
21:07:11.897 DEBUG ScriptExecutor -   --num_thermal_advi_iters=2500
21:07:11.897 DEBUG ScriptExecutor -   --convergence_snr_averaging_window=500
21:07:11.897 DEBUG ScriptExecutor -   --convergence_snr_trigger_threshold=1.000000e-01
21:07:11.897 DEBUG ScriptExecutor -   --convergence_snr_countdown_window=10
21:07:11.897 DEBUG ScriptExecutor -   --max_calling_iters=10
21:07:11.897 DEBUG ScriptExecutor -   --caller_update_convergence_threshold=1.000000e-03
21:07:11.897 DEBUG ScriptExecutor -   --caller_internal_admixing_rate=7.500000e-01
21:07:11.897 DEBUG ScriptExecutor -   --caller_external_admixing_rate=1.000000e+00
21:07:11.897 DEBUG ScriptExecutor -   --disable_caller=false
21:07:11.897 DEBUG ScriptExecutor -   --disable_sampler=false
21:07:11.897 DEBUG ScriptExecutor -   --disable_annealing=false
Traceback (most recent call last):
  File "/paedyl01/disk1/louisshe/tmp/gatk/cohort_denoising_calling.418897092082188314.py", line 179, in <module>
    warm_up_task.engage()
  File "/opt/miniconda/envs/gatk/lib/python3.6/site-packages/gcnvkernel/tasks/inference_task_base.py", line 346, in engage
    converged_continuous = self._update_continuous_posteriors()
  File "/opt/miniconda/envs/gatk/lib/python3.6/site-packages/gcnvkernel/tasks/inference_task_base.py", line 403, in _update_continuous_posteriors
    raise ConvergenceError
gcnvkernel.tasks.inference_task_base.ConvergenceError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/paedyl01/disk1/louisshe/tmp/gatk/cohort_denoising_calling.418897092082188314.py", line 184, in <module>
    sys.exit(gcnvkernel.io_consts.diverged_inference_exit_code)
NameError: name 'sys' is not defined
21:09:00.147 INFO  GermlineCNVCaller - Shutting down engine
[August 13, 2024 9:09:00 PM GMT] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 3.36 minutes.
Runtime.totalMemory()=3168796672
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 1

Much appreciated for your help indeed!

Best regards, Louis

gokalpcelik commented 3 months ago

Can you provide the whole log as an attachment? It is really hard to debug with missing information.

snakesch commented 3 months ago

stderr.log

gokalpcelik commented 3 months ago

How did you install the GATK Conda environment? Looks like a problem with the conda environment configuration. One possible reason could be that conda environment is not the version 4.3.0.0 is requesting.

snakesch commented 3 months ago

I did not install the GATK conda environment. Instead, I created a singularity container using the GATK v4.3.0.0 docker image. Also, the GATK version I have outside container is v4.6.0.0, so the fact that GATK v4.3.0.0 got called means it was using the GATK came with the image.

gokalpcelik commented 3 months ago

We don't support singularity so I am not sure how to help if that's the version you are using.