dariober / cnv_facets

Somatic copy variant caller (CNV) for next generation sequencing
Other
68 stars 15 forks source link

Faceting variables must have at least one value? the file just has columns #9

Closed kobejamescurry closed 5 years ago

kobejamescurry commented 5 years ago

thanks a lot image

dariober commented 5 years ago

Hi- a couple of thoughts...

kobejamescurry commented 5 years ago

thanks a lot for giving your time for my question

yes, there is no records in the file.

I download the matching vcf and index of vcf from the official site。

how to get the stack trace, i do not know, sorry.

./cnv_facets.R -t tumour.bam -n normal.bam -vcf xx.snps.vcf.gz -o test1

dariober commented 5 years ago

Hi-

there is no records in the file.

Can you check the bam files and vcf files use the same chromosome names? You can do this by looking at the output of the two commands (you can post the output here if not too large):

samtools view -H tumour.bam
bcftools view -h xx.snps.vcf.gz # or the first few VCF records 

And just in case, check the bam files do have some reads mapped at some of the SNP positions.

how to get the stack trace, i do not know

Sorry, I just meant the output of the cnv_facets.R that you see on your terminal

kobejamescurry commented 5 years ago

sorry for the Thanks a lot. My bam is chr, and vcf just has number. when I use facets, it seems to be ok, but the purity is NA

I want to ask you another question, 00-common_all.vcf.gz and All_20180423.vcf.gz and common_all_20180423.vcf.gz
which is one is better, and why, you suggested me common_all_20180423.vcf.gz last time.

image

here I use facets, but the purity is always NA , I test wes and panel data, do you know why, thanks a lot.

dariober commented 5 years ago

My bam is chr, and vcf just has number

That is the problem. Chromosome names don't match so none of the SNPs has any read count assigned and any further analysis cannot make sense. Either rename the chromosomes in the VCF or in the bam file or, better, use the VCF from the GATK directory e.g. https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/GATK/ . Once we are at it... Make sure the genome build is the same. In the link above the genome is GRCh37 (aka hg19), make sure the bam file has been aligned also to GRCh37/hg19.

For the difference between 00-common_all and common_all_<date> see this comment by Alex Reynolds https://www.biostars.org/p/186617/#363617

About All_ vs common_all, I would suggest the "common" file since the All file has all the known SNPs including the rare ones whereas the common file has the, well, common ones (have a look at the dbSNP docs). It's fine to use the All file but the time and memory usage is much bigger for little or no advantage (in my experience, haven't tested the two carefully though).

(PS- as I mentioned above, I suggest to avoid positing screenshots. Better to post the actual URLs or commands)

kobejamescurry commented 5 years ago

thanks a lot. after your guidance, I changed to gatk vcf, the result seems to be ok now. thanks a lot. how do you think of the difference between facets and conventional cnv tools (like cnvkit,), why you finally choose facets to detect cnv, thanks a lot

kobejamescurry commented 5 years ago

I found that you have a deep understanding of cnv and loh, because I read the issue in facets, finding that you have made so many useful suggestion to the author. Sorry, I'm bombarding you with questions... I'd like to make sure my understanding is correct.

hi, I happened to see this done by you, LOH contains two conditions

A loss of heterozygosity refers to a loss of one of the parental copies, which may or may not involve a change in total copy number; specifically, some mutation processes lead to a loss of one parental copy accompanied by a simultaneous gain of the other parental copy in the same region, thus leading to a loss of heterozygosity without changing total copy number, aka copy-neutral loss of heterozygosity.

in your table, why 1, 0 i s not a LOH, and (2, 0) means hete to homo, am I right, thanks a lot.

and how do you annotate these segments with concrete genes, I dowmload refFlat file from ucsc and use betools intersect, I do not know whether it is right?

and is dup-loh also a loh? very much thanks, urgent for your reply

image

pd321 commented 5 years ago

I am observing a similar error. Below is the stack trace of the error:-

command used:-

cnv_facets.R -n N.bam -t T.bam -vcf common_all_20170710.vcf.gz -N 2 -T agilent_v5_v3_merged_baits.bed -g hg19 -o T10
Error: Faceting variables must have at least one value
In addition: Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Execution halted

The SNP coverage file got generated successfully though. Here are a few lines from it

Chromosome,Position,Ref,Alt,File1R,File1A,File1E,File1D,File2R,File2A,File2E,File2D
chr1,65565,.,.,4,0,0,0,2,0,0,0
chr1,65800,.,.,10,0,0,0,38,0,0,0
chr1,65974,A,G,39,0,0,0,92,0,0,0

The vcf and bam have matching chromosome names. He header from bam file shows that chromosomes start with chr and the vcf is from GATK which too have chromosome starting with chr

@HD     VN:1.5  SO:coordinate
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260

Is there any other possible source of the error?

dariober commented 5 years ago

@pd321 Can you check also agilent_v5_v3_merged_baits.bed has chromosome names consistent with the vcf and BAM files?

dariober commented 5 years ago

@pd321 Sorry - this is indeed a bug which I'm going to fix. For the time being, you should be able to run cnv_facets by removing from agilent_v5_v3_merged_baits.bed the prefix chr. E.g. with something like:

sed 's/^chr//' agilent_v5_v3_merged_baits.bed > agilent_v5_v3_merged_baits.tmp.bed
pd321 commented 5 years ago

@dariober Thanks. I was able to run it after removing chr from the agilent_v5_v3_merged_baits.bed file

dariober commented 5 years ago

@pd321 The issue with the target file should be resolved in v0.15.0 now in bioconda.