Closed nukaemon closed 5 years ago
Hey sorry for all the trouble. I havent seen @ symbols in headers of aSCNA files before. Are these comment characters? If so I can push an update to remove them.
The error seems to be related to your indels. Did you pass indels to your command line? Would you mind posting the command line?
Thank you for your advice. Your comment reminds me that I excluded --indel_data_path (and --indel_data_type) because I couldn't promptly understand what file had to be given to it and also thought that indel data was not necessarily needed to calculate deTiN. Now, looking to MuTect2.call_stats.txt in example_data, I realize that the file is actually a vcf file from MuTect2. Sorry for my laziness but I feel that the file name "MuTect2.call_stats.txt" is misleading to some users because call-stats files (from Mutect_v1) and vcf files (from MuTect2 or Mutect2) are different.
My fixed command line is below and it was successful.
python ${DETIN} \ --mutation_data_path TEST.call-stats.tsv \ --cn_data_path TEST.modelFinal.seg \ --tumor_het_data TEST.hets.tsv \ --normal_het_data TEST.hets.normal.tsv \ --exac_data_path exac.pickle \ --output_name TEST \ --indel_data_path TEST.gatk4m2.filtered.vcf \ --indel_data_type Mutect2 \ --output_dir output
Now, I came up with a question regarding --indel_data_type. What are possible arguments for --indel_data_type option? I gave "Mutect2" to it instead of "MuTect2" because GATK team differentiates these as MuTect2 in GATK3 and Mutect2 in GATK4 as you may know or mentioned here. Does deTiN2 care this difference?
Regarding header lines starting with @, I would get the following error if leaving them.
The first column of all input files should be chromosome: could not find any of the chromosome headers in the first column of TEST.modelFinal.seg
Actually, this is not only the case for segmentation file but also for hets.tsv files, all of which were generated from ModelSegments of the current version of GATK4(4.1.1.0). If this is not what you expected, I appreciate if you update to automatically remove them.
The header lines are like:
@HD VN:1.6
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10 LN:133797422
@SQ SN:chr11 LN:135086622
@SQ SN:chr12 LN:133275309
@SQ SN:chr13 LN:114364328
@SQ SN:chr14 LN:107043718
@SQ SN:chr15 LN:101991189
@SQ SN:chr16 LN:90338345
@SQ SN:chr17 LN:83257441
@SQ SN:chr18 LN:80373285
@SQ SN:chr19 LN:58617616
@SQ SN:chr20 LN:64444167
@SQ SN:chr21 LN:46709983
@SQ SN:chr22 LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrM LN:16569
@RG ID:GATKCopyNumber SM:TEST
Hey I am planning on supporting the @ symbols soon. Will push an update by mid next week. Thanks for pointing out the inconsistency in language. I'll update this as well. DeTiN does not care about the capitalization:
elif indel_type.lower() == 'mutect2'
DeTiN currently supports VCFs from Strelka and M2.
Dear amarotaylor
I prepared all the input files and then running deTiN, I saw upto SSNV based TiN estimate value in standard output log(as below), but ended up with "AttributeError: 'list' object has no attribute 'isnull'' and no output files were generated.
I looked up my aSCNA segmentation file and found some lines had "NaN" values, so I deleted the lines and reran deTiN(as well as the first several lines starting with "@" that also gives an error), but the result was same. As mentioned in #16 , I generated my aSCNA segment file using ModelSegments of the latest GATK4 version(4.1.1.0). Do you have any clues from the error message? My installation version is python(2.7.15), numpy(1.15.4), pandas(0.24.2), and scipy( 1.2.1).
standard output
then error output