aganezov / RCK

RCK: Reconstruction of clone- and haplotype-specific Cancer Karyotypes
MIT License
16 stars 4 forks source link

SVs in telomeric regions #1

Open mzwaig opened 3 years ago

mzwaig commented 3 years ago

Hello,

I'm super excited to test this tool out. I'm currently testing it out with the LongRanger output and the 10X-specific TitanCNA workflow. I keep getting this error message when one of the SVs falls out side the telomeric regions/CNV regions.

Is there a parameter I can use to ignore these calls or do I have to filter my adjacencies input? Similarly, LongRanger calls a chromosomal deletions of chrY in females, is their a way to ignore these or do I need to filter them.

Thanks, Melissa

telomere_positions=input_telomere_positions) File "/home/mzwaig/.local/lib/python3.7/site-packages/rck/core/structures.py", line 1590, in refined_scnt_with_adjacencies_and_telomeres "".format(positions=",".join(map(lambda p: p.stable_id_non_hap, bad_positions)))) ValueError: Either adjacency or telomere positions (8:-146370000) do not lie within segment

aganezov commented 3 years ago

Hello and thank you for your interest in RCK!

to ensure that SV in telomere regions can be processed and analyzed you should ensure that the input clone- and allele-specific CNV profile is defined on segments that are spanning all defined SV/novel telomere breakpoints.

In your case you have a, probably, SV breakend -146,370,000 on chromosome 8, but it seems that your input CNV profile does not have segments defined for this position on chromosome 8. This sometimes happens with CNV profile inference methods where they would, either because of segmentation artifacts, or due to insufficient coverage, start not at position 1 on the chromosome, or end before the full length of the chromosome.

Try in the input .scnt.tsv file extend the end position of the last segment on chromosome 8 to be covering the rest of chromosome 8. Without any input copy number for the very beginning/end of a chromosome, it is safe to assume that the closest approximation to that is whatever the adjacent segment's CNs are.

mzwaig commented 3 years ago

Thank for your response. I'll edit my scnt file as you suggested.

I also want to incorporate additional SV callers (output not in vcf format) and I have a question about how you assign the strands for the adjacency files. Does it matter which end is positive and which is negative or it random as long as DUPs and DELs have one end positive and one end negative and BNDs and INV have the same sign for both ends.

Thanks again, Melissa

aganezov commented 3 years ago

sure, using multiple SV callers in an ensemble approach makes sense to improve sensitivity. Just keep in mind that specificity may go done with a pure union-ensemble approach.

The strand sign +|- is important indeed, as it determines the direction that the "threading" of the derived chromosomes is determined with the ref fragments and the SVs/novel adjacencies connecting them. A DEL entry chr:start-end have a "+-" signature specifying signs for the start breakpoint and the end breakpoint respectively. On can think about it as "the deletion novel adjacency connects the "right end" (i.e., +) of ref fragment ending at the start nucleotide at chr:start and connecting "the left" end (i.e., -) of the subsequent, determining linear threading of the derived chromosome.

An INS entry is similar to the DEL, just that end coordinate is the start+1 (i.e., it connects the original reference fragments ending at start and the following fragment starting at start+1).

A DUP entry (a traditional tandem DUP) chr:start-end has the "-+" signature, as "the right end" chr:end (i.e., +) of the duplicated fragment to the left end chr:start (i.e., -) beginning of the duplicated fragment, defining a threading of the derived chromosome with a tandem copy of chr:start-end inserted right after the chr:end coordinate.

Similarly you can think of INV entries, though a traditional inversion determined two novel adjacencies.

You can also check the VCF format on breakends and their +|- specification (though they use [ and ] respectively): https://samtools.github.io/hts-specs/VCFv4.2.pdf page 12 and beyond.