Open igordot opened 7 years ago
I see regions in _CNVs that aren't present in _ratio.txt. Is that expected?
When a mappability in a window or several neibougring window is low, ratio
file can contain '-1' values. But FREEC usually can make a guess about such regions. For instance if there is a gain on the left and on the right, this 'unknown' regions will be also assigned a gain status, and it will appear as such in the CNVs
file.
I would not expect regions to be missing from ratio
. They should simply get '-1' values. Only if you work with exome data some regions may disappear. Is your data WGS or WES?
It's WES. Why would the regions disappear and why would _CNVs
have regions that aren't in ratio.txt
?
because for WES there is not point to output all windows in the genome. So regions with few or no reads in the control dataset are removed.
I think if you want to see all regions of the genome, you should set printNA=TRUE
See http://boevalab.com/FREEC/tutorial.html#CONFIG
Thanks for clarifying.
So what should I do if I want to see copy number info for a specific region? Sometimes a region is only in _CNVs
and sometimes it's only in ratio.txt
. Is there a single file I can check?
Igor, as I understand it: the _CNVs
file contains start and end positions of CNAs. ratio.txt
contains values per bin or per exon. So to know copy number of a given region you can check whether this region is included into (or partially overlaps) any CNA from _CNVs
. If it is not the case, the corresponding copy number is equal to the main ploidy.
I am trying to understand the difference between the regions in
_CNVs
and_ratio.txt
and capture regions BED files. From what I understand,_CNVs
will have all the regions with alterations after merging neighboring regions. That would make it a subset of_ratio.txt
, but I see regions in_CNVs
that aren't present in_ratio.txt
. Is that expected?I also compared
_ratio.txt
to the capture regions BED file. They seem to be identical, but_ratio.txt
is heavily filtered (more than half the regions are filtered). The filtering seems to be based on the matched normal (all_ratio.txt
files using the same matched normal have the same length). The regions withCopyNumber
set to-1
do not make it to_CNVs
, since there is insufficient data there. What is the difference between-1
regions and completely missing regions and why are so many missing? I am looking at the BAMs at some of the missing regions and they seem okay.