getzlab / deTiN

DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.
BSD 3-Clause "New" or "Revised" License
49 stars 21 forks source link

AttributeError: 'list' object has no attribute 'isnull' #22

Closed nukaemon closed 5 years ago

nukaemon commented 5 years ago

Dear amarotaylor

I prepared all the input files and then running deTiN, I saw upto SSNV based TiN estimate value in standard output log(as below), but ended up with "AttributeError: 'list' object has no attribute 'isnull'' and no output files were generated.

I looked up my aSCNA segmentation file and found some lines had "NaN" values, so I deleted the lines and reran deTiN(as well as the first several lines starting with "@" that also gives an error), but the result was same. As mentioned in #16 , I generated my aSCNA segment file using ModelSegments of the latest GATK4 version(4.1.1.0). Do you have any clues from the error message? My installation version is python(2.7.15), numpy(1.15.4), pandas(0.24.2), and scipy( 1.2.1).

standard output

/home/ngs_dev/Analysis/Pipelines/development/20190409.mod/Exome_pipeline/tools/deTiN/deTiN/deTiN_utilities.py:364: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return C[[x.astype(int)]] + position
changing header of seg file from CONTIG to Chromosome
changing header of seg file from START to Start.bp
changing header of seg file from END to End.bp
changing header of seg file from MINOR_ALLELE_FRACTION_POSTERIOR_50 to f
transforming log2 data tau column to 2 centered: 2^(CNratio)+1
changing header of seg file from LOG2_COPY_RATIO_POSTERIOR_50 to tau
changing header of seg file from NUM_POINTS_COPY_RATIO to n_probes
changing header of seg file from CONTIG to Chromosome
changing header of seg file from START to Start.bp
changing header of seg file from END to End.bp
changing header of seg file from MINOR_ALLELE_FRACTION_POSTERIOR_50 to f
transforming log2 data tau column to 2 centered: 2^(CNratio)+1
changing header of seg file from LOG2_COPY_RATIO_POSTERIOR_50 to tau
changing header of seg file from NUM_POINTS_COPY_RATIO to n_probes
pre-processing SSNV data
initialized TiN to 0
TiN inference after 1 iterations = 0.44
TiN inference after 2 iterations = 0.47000000000000003
TiN inference after 3 iterations = 0.48
TiN inference after 4 iterations = 0.49
TiN inference after 5 iterations = 0.49
SSNV based TiN estimate converged: TiN = 0.49 based on 1080 sites

then error output

Traceback (most recent call last):
  File "/home/ngs_dev/Analysis/Pipelines/development/20190409.mod/Exome_pipeline/tools/deTiN/deTiN/deTiN.py", line 606, in <module>
    main()
  File "/home/ngs_dev/Analysis/Pipelines/development/20190409.mod/Exome_pipeline/tools/deTiN/deTiN/deTiN.py", line 564, in main
    do = output(di, ssnv_based_model, ascna_based_model)
  File "/home/ngs_dev/Analysis/Pipelines/development/20190409.mod/Exome_pipeline/tools/deTiN/deTiN/deTiN.py", line 259, in __init__
    if self.input.indel_table.isnull().values.sum() == 0:
AttributeError: 'list' object has no attribute 'isnull'
amarotaylor commented 5 years ago

Hey sorry for all the trouble. I havent seen @ symbols in headers of aSCNA files before. Are these comment characters? If so I can push an update to remove them.

The error seems to be related to your indels. Did you pass indels to your command line? Would you mind posting the command line?

nukaemon commented 5 years ago

Thank you for your advice. Your comment reminds me that I excluded --indel_data_path (and --indel_data_type) because I couldn't promptly understand what file had to be given to it and also thought that indel data was not necessarily needed to calculate deTiN. Now, looking to MuTect2.call_stats.txt in example_data, I realize that the file is actually a vcf file from MuTect2. Sorry for my laziness but I feel that the file name "MuTect2.call_stats.txt" is misleading to some users because call-stats files (from Mutect_v1) and vcf files (from MuTect2 or Mutect2) are different.

My fixed command line is below and it was successful.

python ${DETIN} \ --mutation_data_path TEST.call-stats.tsv \ --cn_data_path TEST.modelFinal.seg \ --tumor_het_data TEST.hets.tsv \ --normal_het_data TEST.hets.normal.tsv \ --exac_data_path exac.pickle \ --output_name TEST \ --indel_data_path TEST.gatk4m2.filtered.vcf \ --indel_data_type Mutect2 \ --output_dir output

Now, I came up with a question regarding --indel_data_type. What are possible arguments for --indel_data_type option? I gave "Mutect2" to it instead of "MuTect2" because GATK team differentiates these as MuTect2 in GATK3 and Mutect2 in GATK4 as you may know or mentioned here. Does deTiN2 care this difference?

nukaemon commented 5 years ago

Regarding header lines starting with @, I would get the following error if leaving them.

The first column of all input files should be chromosome: could not find any of the chromosome headers in the first column of TEST.modelFinal.seg

Actually, this is not only the case for segmentation file but also for hets.tsv files, all of which were generated from ModelSegments of the current version of GATK4(4.1.1.0). If this is not what you expected, I appreciate if you update to automatically remove them.

The header lines are like:

@HD     VN:1.6
@SQ     SN:chr1 LN:248956422
@SQ     SN:chr2 LN:242193529
@SQ     SN:chr3 LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     SN:chr6 LN:170805979
@SQ     SN:chr7 LN:159345973
@SQ     SN:chr8 LN:145138636
@SQ     SN:chr9 LN:138394717
@SQ     SN:chr10        LN:133797422
@SQ     SN:chr11        LN:135086622
@SQ     SN:chr12        LN:133275309
@SQ     SN:chr13        LN:114364328
@SQ     SN:chr14        LN:107043718
@SQ     SN:chr15        LN:101991189
@SQ     SN:chr16        LN:90338345
@SQ     SN:chr17        LN:83257441
@SQ     SN:chr18        LN:80373285
@SQ     SN:chr19        LN:58617616
@SQ     SN:chr20        LN:64444167
@SQ     SN:chr21        LN:46709983
@SQ     SN:chr22        LN:50818468
@SQ     SN:chrX LN:156040895
@SQ     SN:chrY LN:57227415
@SQ     SN:chrM LN:16569
@RG     ID:GATKCopyNumber       SM:TEST
amarotaylor commented 5 years ago

Hey I am planning on supporting the @ symbols soon. Will push an update by mid next week. Thanks for pointing out the inconsistency in language. I'll update this as well. DeTiN does not care about the capitalization: elif indel_type.lower() == 'mutect2'

DeTiN currently supports VCFs from Strelka and M2.