Closed hungying closed 2 years ago
Hi,
It looks like the boosting part isn't going to work on the small sample data set. You can disable boosting via a command line parameter and then rerun and let's see if that works.
ie. --boosting_method none
On Mon, May 2, 2022 at 11:31 AM HungYing Lin @.***> wrote:
Hi,
I used Docker to run the CTAT mutations (version: 3.2.0) with the example within the repo, but the pipeline didn't finish in the end. The command I used:
docker run --rm -v /home/ec2-user/data/ctat-mutations-CTAT-Mutations-v3.2.0/testing:/data \
-v /home/ec2-user/data/ctat-mutations-CTAT-Mutations-v3.2.0/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro \
trinityctat/ctat_mutations /usr/local/src/ctat-mutations/ctat_mutations \
--left /data/reads_1.fastq.gz \
--right /data/reads_2.fastq.gz \
--sample_id test \
--output /data/ctat_mutations_outdir \
--cpu 10 --genome_lib_dir /ctat_genome_lib_dir
Regarding the reference dataset, directly downloaded it from https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz And the process stopped at "call-VariantFiltration" and the “stdout” shows as follows:
AC ALT BaseQRankSum DJ ... TCR TDM VAF VMMF
IND ...
chr1:11130632 1.0 1.0 -2.416 13355.0 ... 68.0 2.0 0.565 0.000
chr1:11130740 1.0 -1.0 -6.406 13247.0 ... 57.0 1.0 0.527 0.000
chr1:26774594 1.0 1.0 0.508 696.0 ... 23.0 0.0 0.600 0.000
chr1:114716123 2.0 1.0 0.000 54.0 ... 283.0 17.0 1.000 0.000
chr2:177234087 2.0 -1.0 1.060 184.0 ... 81.0 5.0 0.987 0.041
[5 rows x 21 columns]
Features used for modeling: ['AC', 'ALT', 'BaseQRankSum', 'DJ', 'DP', 'ED', 'Entropy', 'ExcessHet', 'FS', 'Homopolymer', 'LEN', 'MLEAF', 'MMF', 'QUAL', 'REF', 'RPT', 'RS', 'ReadPosRankSum', 'SAO', 'SOR', 'TCR', 'TDM', 'VAF', 'VMMF']
[15:32:35] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
Predicted true variants: 6 chr2:201286677
9 chr3:49411352
15 chr7:55198724
16 chr9:21968200
31 chr17:39723014
Name: chr:pos, dtype: object
AC ALT BaseQRankSum DJ ... TCR TDM VAF VMMF
IND ...
chr1:11130632 1.0 1.0 -2.416 13355.0 ... 68.0 2.0 0.565 0.000
chr1:11130740 1.0 -1.0 -6.406 13247.0 ... 57.0 1.0 0.527 0.000
chr1:26774594 1.0 1.0 0.508 696.0 ... 23.0 0.0 0.600 0.000
chr1:114716123 2.0 1.0 0.000 54.0 ... 283.0 17.0 1.000 0.000
chr2:177234087 2.0 -1.0 1.060 184.0 ... 81.0 5.0 0.987 0.041
[5 rows x 21 columns]
SAO
IND
chr2:201287397 0.0
Features used for modeling: ['AC', 'ALT', 'BaseQRankSum', 'DJ', 'DP', 'ED', 'Entropy', 'ExcessHet', 'FS', 'Homopolymer', 'LEN', 'MLEAF', 'MMF', 'QUAL', 'REF', 'RPT', 'RS', 'ReadPosRankSum', 'SAO', 'SOR', 'TCR', 'TDM', 'VAF', 'VMMF']
'RS' feature must be present in the vcf
Any idea on how I fixed this issue?
Thank you so much, Hung-Ying
— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/109, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXYT3KZMDBP4BSLUMXTVH7YM7ANCNFSM5U4KDCPQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
@brianjohnhaas Yes, it works.
Thank you so much, Hung-Ying
Hi,
I used Docker to run the CTAT mutations (version: 3.2.0) with the example within the repo, but the pipeline didn't finish in the end. The command I used:
Regarding the reference dataset, directly downloaded it from https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz And the process stopped at "call-VariantFiltration" and the “stdout” shows as follows:
Any idea on how I fixed this issue?
Thank you so much, Hung-Ying