Open mzwaig opened 3 years ago
Hello and thank you for your interest in RCK!
to ensure that SV in telomere regions can be processed and analyzed you should ensure that the input clone- and allele-specific CNV profile is defined on segments that are spanning all defined SV/novel telomere breakpoints.
In your case you have a, probably, SV breakend -146,370,000 on chromosome 8, but it seems that your input CNV profile does not have segments defined for this position on chromosome 8. This sometimes happens with CNV profile inference methods where they would, either because of segmentation artifacts, or due to insufficient coverage, start not at position 1 on the chromosome, or end before the full length of the chromosome.
Try in the input .scnt.tsv
file extend the end position of the last segment on chromosome 8 to be covering the rest of chromosome 8. Without any input copy number for the very beginning/end of a chromosome, it is safe to assume that the closest approximation to that is whatever the adjacent segment's CNs are.
Thank for your response. I'll edit my scnt file as you suggested.
I also want to incorporate additional SV callers (output not in vcf format) and I have a question about how you assign the strands for the adjacency files. Does it matter which end is positive and which is negative or it random as long as DUPs and DELs have one end positive and one end negative and BNDs and INV have the same sign for both ends.
Thanks again, Melissa
sure, using multiple SV callers in an ensemble approach makes sense to improve sensitivity. Just keep in mind that specificity may go done with a pure union-ensemble approach.
The strand sign +|- is important indeed, as it determines the direction that the "threading" of the derived chromosomes is determined with the ref fragments and the SVs/novel adjacencies connecting them. A DEL entry chr:start-end
have a "+-" signature specifying signs for the start
breakpoint and the end
breakpoint respectively. On can think about it as "the deletion novel adjacency connects the "right end" (i.e., +
) of ref fragment ending at the start
nucleotide at chr:start
and connecting "the left" end (i.e., -
) of the subsequent, determining linear threading of the derived chromosome.
An INS entry is similar to the DEL, just that end coordinate is the start+1
(i.e., it connects the original reference fragments ending at start
and the following fragment starting at start+1
).
A DUP entry (a traditional tandem DUP) chr:start-end
has the "-+" signature, as "the right end" chr:end
(i.e., +
) of the duplicated fragment to the left end chr:start
(i.e., -
) beginning of the duplicated fragment, defining a threading of the derived chromosome with a tandem copy of chr:start-end
inserted right after the chr:end
coordinate.
Similarly you can think of INV entries, though a traditional inversion determined two novel adjacencies.
You can also check the VCF format on breakends and their +|- specification (though they use [ and ] respectively): https://samtools.github.io/hts-specs/VCFv4.2.pdf page 12 and beyond.
Hello,
I'm super excited to test this tool out. I'm currently testing it out with the LongRanger output and the 10X-specific TitanCNA workflow. I keep getting this error message when one of the SVs falls out side the telomeric regions/CNV regions.
Is there a parameter I can use to ignore these calls or do I have to filter my adjacencies input? Similarly, LongRanger calls a chromosomal deletions of chrY in females, is their a way to ignore these or do I need to filter them.
Thanks, Melissa
telomere_positions=input_telomere_positions) File "/home/mzwaig/.local/lib/python3.7/site-packages/rck/core/structures.py", line 1590, in refined_scnt_with_adjacencies_and_telomeres "".format(positions=",".join(map(lambda p: p.stable_id_non_hap, bad_positions)))) ValueError: Either adjacency or telomere positions (8:-146370000) do not lie within segment