TrinityCTAT / ctat-mutations

Mutation detection using GATK4 best practices and latest RNA editing filters resources. Works with both Hg38 and Hg19
https://github.com/TrinityCTAT/ctat-mutations
Other
71 stars 19 forks source link

CTAT mutations V3.2.0 running error with example data #109

Closed hungying closed 2 years ago

hungying commented 2 years ago

Hi,



I used Docker to run the CTAT mutations (version: 3.2.0) with the example within the repo, but the pipeline didn't finish in the end. The command I used:

docker run --rm -v /home/ec2-user/data/ctat-mutations-CTAT-Mutations-v3.2.0/testing:/data \
-v /home/ec2-user/data/ctat-mutations-CTAT-Mutations-v3.2.0/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro \
trinityctat/ctat_mutations /usr/local/src/ctat-mutations/ctat_mutations \
--left /data/reads_1.fastq.gz \
--right /data/reads_2.fastq.gz \
--sample_id test \
--output /data/ctat_mutations_outdir \
--cpu 10 --genome_lib_dir /ctat_genome_lib_dir


Regarding the reference dataset, directly downloaded it from https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz 

And the process stopped at "call-VariantFiltration" and the “stdout” shows as follows: 



                 AC  ALT  BaseQRankSum       DJ  ...    TCR   TDM    VAF   VMMF
IND                                              ...                           
chr1:11130632   1.0  1.0        -2.416  13355.0  ...   68.0   2.0  0.565  0.000
chr1:11130740   1.0 -1.0        -6.406  13247.0  ...   57.0   1.0  0.527  0.000
chr1:26774594   1.0  1.0         0.508    696.0  ...   23.0   0.0  0.600  0.000
chr1:114716123  2.0  1.0         0.000     54.0  ...  283.0  17.0  1.000  0.000
chr2:177234087  2.0 -1.0         1.060    184.0  ...   81.0   5.0  0.987  0.041

[5 rows x 21 columns]
Features used for modeling:  ['AC', 'ALT', 'BaseQRankSum', 'DJ', 'DP', 'ED', 'Entropy', 'ExcessHet', 'FS', 'Homopolymer', 'LEN', 'MLEAF', 'MMF', 'QUAL', 'REF', 'RPT', 'RS', 'ReadPosRankSum', 'SAO', 'SOR', 'TCR', 'TDM', 'VAF', 'VMMF']
[15:32:35] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
Predicted true variants: 6     chr2:201286677
9      chr3:49411352
15     chr7:55198724
16     chr9:21968200
31    chr17:39723014
Name: chr:pos, dtype: object
                 AC  ALT  BaseQRankSum       DJ  ...    TCR   TDM    VAF   VMMF
IND                                              ...                           
chr1:11130632   1.0  1.0        -2.416  13355.0  ...   68.0   2.0  0.565  0.000
chr1:11130740   1.0 -1.0        -6.406  13247.0  ...   57.0   1.0  0.527  0.000
chr1:26774594   1.0  1.0         0.508    696.0  ...   23.0   0.0  0.600  0.000
chr1:114716123  2.0  1.0         0.000     54.0  ...  283.0  17.0  1.000  0.000
chr2:177234087  2.0 -1.0         1.060    184.0  ...   81.0   5.0  0.987  0.041

[5 rows x 21 columns]
                SAO
IND                
chr2:201287397  0.0
Features used for modeling:  ['AC', 'ALT', 'BaseQRankSum', 'DJ', 'DP', 'ED', 'Entropy', 'ExcessHet', 'FS', 'Homopolymer', 'LEN', 'MLEAF', 'MMF', 'QUAL', 'REF', 'RPT', 'RS', 'ReadPosRankSum', 'SAO', 'SOR', 'TCR', 'TDM', 'VAF', 'VMMF']
'RS' feature must be present in the vcf



Any idea on how I fixed this issue?



Thank you so much,
 Hung-Ying

brianjohnhaas commented 2 years ago

Hi,

It looks like the boosting part isn't going to work on the small sample data set. You can disable boosting via a command line parameter and then rerun and let's see if that works.

ie. --boosting_method none

On Mon, May 2, 2022 at 11:31 AM HungYing Lin @.***> wrote:

Hi,

I used Docker to run the CTAT mutations (version: 3.2.0) with the example within the repo, but the pipeline didn't finish in the end. The command I used:

docker run --rm -v /home/ec2-user/data/ctat-mutations-CTAT-Mutations-v3.2.0/testing:/data \

-v /home/ec2-user/data/ctat-mutations-CTAT-Mutations-v3.2.0/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro \

trinityctat/ctat_mutations /usr/local/src/ctat-mutations/ctat_mutations \

--left /data/reads_1.fastq.gz \

--right /data/reads_2.fastq.gz \

--sample_id test \

--output /data/ctat_mutations_outdir \

--cpu 10 --genome_lib_dir /ctat_genome_lib_dir

Regarding the reference dataset, directly downloaded it from https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz And the process stopped at "call-VariantFiltration" and the “stdout” shows as follows:

             AC  ALT  BaseQRankSum       DJ  ...    TCR   TDM    VAF   VMMF

IND ...

chr1:11130632 1.0 1.0 -2.416 13355.0 ... 68.0 2.0 0.565 0.000

chr1:11130740 1.0 -1.0 -6.406 13247.0 ... 57.0 1.0 0.527 0.000

chr1:26774594 1.0 1.0 0.508 696.0 ... 23.0 0.0 0.600 0.000

chr1:114716123 2.0 1.0 0.000 54.0 ... 283.0 17.0 1.000 0.000

chr2:177234087 2.0 -1.0 1.060 184.0 ... 81.0 5.0 0.987 0.041

[5 rows x 21 columns]

Features used for modeling: ['AC', 'ALT', 'BaseQRankSum', 'DJ', 'DP', 'ED', 'Entropy', 'ExcessHet', 'FS', 'Homopolymer', 'LEN', 'MLEAF', 'MMF', 'QUAL', 'REF', 'RPT', 'RS', 'ReadPosRankSum', 'SAO', 'SOR', 'TCR', 'TDM', 'VAF', 'VMMF']

[15:32:35] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

Predicted true variants: 6 chr2:201286677

9 chr3:49411352

15 chr7:55198724

16 chr9:21968200

31 chr17:39723014

Name: chr:pos, dtype: object

             AC  ALT  BaseQRankSum       DJ  ...    TCR   TDM    VAF   VMMF

IND ...

chr1:11130632 1.0 1.0 -2.416 13355.0 ... 68.0 2.0 0.565 0.000

chr1:11130740 1.0 -1.0 -6.406 13247.0 ... 57.0 1.0 0.527 0.000

chr1:26774594 1.0 1.0 0.508 696.0 ... 23.0 0.0 0.600 0.000

chr1:114716123 2.0 1.0 0.000 54.0 ... 283.0 17.0 1.000 0.000

chr2:177234087 2.0 -1.0 1.060 184.0 ... 81.0 5.0 0.987 0.041

[5 rows x 21 columns]

            SAO

IND

chr2:201287397 0.0

Features used for modeling: ['AC', 'ALT', 'BaseQRankSum', 'DJ', 'DP', 'ED', 'Entropy', 'ExcessHet', 'FS', 'Homopolymer', 'LEN', 'MLEAF', 'MMF', 'QUAL', 'REF', 'RPT', 'RS', 'ReadPosRankSum', 'SAO', 'SOR', 'TCR', 'TDM', 'VAF', 'VMMF']

'RS' feature must be present in the vcf

Any idea on how I fixed this issue?

Thank you so much, Hung-Ying

— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/109, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXYT3KZMDBP4BSLUMXTVH7YM7ANCNFSM5U4KDCPQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

hungying commented 2 years ago

@brianjohnhaas Yes, it works.

Thank you so much, Hung-Ying