bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
189 stars 53 forks source link

SomaticSeq Prediction Error #99

Closed robinjugas closed 3 years ago

robinjugas commented 3 years ago

Hello Mr. Fang, I am trying to run somaticseq train and prediction, based on conda version 3.6.2 using XGBOOST Using commands: somaticseq_parallel.py --threads 20 --output-directory somatic_seq_results/TRAINING --genome-reference /mnt/ssd/ssd_3/references/homsap/GRCh37-p13/seq/GRCh37-p13.fa --inclusion-region /mnt/ssd/ssd_3/references/homsap/GRCh37-p13/intervals/TruSeq_Exome/TruSeq_Exome.bed --truth-snv bamsurgeon/EF1059_truepositive_SNV.vcf --truth-indel bamsurgeon/EF1059_truepositive_INDELS.vcf -algo xgboost -train paired --tumor-bam-file bamsurgeon/EF1059_tumor.bam --normal-bam-file ../input_files/mapped/EF1059_normal.bam --vardict-vcf variant_calls/EF1059/vardict/VarDict.vcf --varscan-snv variant_calls/EF1059/varscan/VarScan2.snp.vcf --varscan-indel variant_calls/EF1059/varscan/VarScan2.indel.vcf

somaticseq_parallel.py --threads 20 --output-directory somatic_seq_results/EF1059 --genome-reference /mnt/ssd/ssd_3/references/homsap/GRCh37-p13/seq/GRCh37-p13.fa --inclusion-region /mnt/ssd/ssd_3/references/homsap/GRCh37-p13/intervals/TruSeq_Exome/TruSeq_Exome.bed --minimum-num-callers 0.4 --dbsnp-vcf /mnt/ssd/ssd_3/references/homsap/GRCh37-p13/annot/dbSNP/common_all.vcf.gz --classifier-snv somatic_seq_results/TRAINING/Ensemble.sSNV.tsv.xgb.v3.6.2.classifier.txt --classifier-indel somatic_seq_results/TRAINING/Ensemble.sINDEL.tsv.xgb.v3.6.2.classifier.txt -algo xgboost paired --tumor-bam-file ../input_files/mapped/EF1059_tumor.bam --normal-bam-file ../input_files/mapped/EF1059_normal.bam --vardict-vcf variant_calls/EF1059/vardict/VarDict.vcf --varscan-snv variant_calls/EF1059/varscan/VarScan2.snp.vcf --varscan-indel variant_calls/EF1059/varscan/VarScan2.indel.vcf

I am running training on artifical tumor bam created by BAMsurgeon. However, I am getting a warning in somaticseq train :

[12:31:41] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[12:31:42] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

and an error in somaticseq predict:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/bin/somaticseq_parallel.py", line 35, in runPaired_by_region
    run_somaticseq.runPaired(outdir_i, ref, tbam, nbam, tumor_name, normal_name, truth_snv, truth_indel, classifier_snv, classifier_indel, pass_threshold, lowqual_threshold, hom_threshold, het_threshold, dbsnp, cosmic, inclusion, exclusion, mutect, indelocator, mutect2, varscan_snv, varscan_indel, jsm, sniper, vardict, muse, lofreq_snv, lofreq_indel, scalpel, strelka_snv, strelka_indel, tnscope, platypus, min_mq, min_bq, min_caller, somaticseq_train, ensembleOutPrefix, consensusOutPrefix, classifiedOutPrefix, algo, keep_intermediates, train_seed, tree_depth, iterations, features_excluded)
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/site-packages/somaticseq/run_somaticseq.py", line 142, in runPaired
    modelPredictor(ensembleSnv, classifiedSnvTsv, algo, classifier_snv, iterations=iterations, features_to_exclude=features_excluded)
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/site-packages/somaticseq/run_somaticseq.py", line 80, in modelPredictor
    somatic_xgboost.predictor(classifier, input_file, output_file, non_features, iterations)
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/site-packages/somaticseq/somatic_xgboost.py", line 85, in predictor
    xgb_model.load_model(model)
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/site-packages/xgboost/core.py", line 1728, in load_model
    _check_call(_LIB.XGBoosterLoadModel(
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/site-packages/xgboost/core.py", line 189, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: basic_string::_M_replace_aux
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/bin/somaticseq_parallel.py", line 123, in <module>
    subdirs = pool.map(runPaired_by_region_i, bed_splitted)
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/mnt/ssd/ssd_1/snakemake/stage327_solid_tumors_children.F_one_control/EF1059/.snakemake/conda/0bd8304d/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
xgboost.core.XGBoostError: basic_string::_M_replace_aux

Attaching both two logs as with bamsurgeon log, as it might be useful. somaticseq_train.log somaticseq_predict.log bamsurgeon.log

Thank you for any help, or any hints where should I look for any mistake at my side. Regards, RJ

litaifang commented 3 years ago

Try Ensemble.sSNV.tsv.xgb.v3.6.2.classifier and Ensemble.sINDEL.tsv.xgb.v3.6.2.classifier instead of Ensemble.sSNV.tsv.xgb.v3.6.2.classifier.txt and Ensemble.sINDEL.tsv.xgb.v3.6.2.classifier.txt first. Hopefully, that's the only problem. There should be two files that end with .classifier.

robinjugas commented 3 years ago

Thank you very much. It worked. Best regards, RJ